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ABSTRACT 

This thesis discusses a generalized problem of stochastic control, 
in which multiple controllers with different data bases are present. 
The vehicle for the investigation is the finite-state, finite-memory 
(FSFM) stochastic control problem. For this problem, the usual 
technique of stochastic dynamic programming does not apply. Instead, 
optimality conditions are obtained by deriving an equivalent 
deterministic optimal control problem. 

A FSFM minimum principle is obtained via the equivalent deterministic 
problem. The minimum principle suggests the development of a 
numerical optimization algorithm, the min-H algorithm. The relation- 
ship between the sufficiency of the minimum principle (which is in 
general only a necessary condition) and the informational properties 
of the problem is investigated. 

Dynamic programming functional equations for the FSFM problem are 
also obtained from the equivalent deterministic problem. Both the 
finite and infinite horizon cases are considered. Numerical 
solution of the functional equations is discussed. 

To illustrate the general theory, a problem of hypothesis testing 
with 1-bit memory is investigated. The discussion illustrates the 
application of control theoretic techniques to information processing 
problems . 
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CHAPTER I 


INTRODUCTION 

1,1 Stochastic Control and Large Scale Systems 

The fundamental problem of control engineering is illustrated 
in Figure 1.1. 1. A fixed plant is given with certain variables 
(inputs) available for manipulation and other variables (outputs) 
available for observation. A controller must be designed to 
choose the plant inputs based on the observations so that the plant 
behaves in a desired fashion. 

In deriving a mathematical model for the plant, phenomena 
which cannot be adequately explained by simple deterministic models 
are commonly treated as stochastic disturbances. Stochastic 
optimal control theory has been developed for problems of this type. 
While it is true that the theory at present has been unsuccessful 
in producing explicit solutions to practical non-linear problems, 
nevertheless the theory provides a useful perspective and a con- 
venient framework for deriving suboptimal, but practical and 
feasible policies. 

Consider for example the Safeguard ballistic missile defense 
system, which can be considered a large stochastic control problem. 
Of course, the problem is too complicated to be solved in this 
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formulation, but parts of the problem are tractable. Just to 
cite one example, the Kalman filtering theory is used for the 
tracking function. But of more fundamental importance is the 
perspective available from adopting the stochastic control viewpoint: 
the state space formalism, the explicit treatment of uncertainty, 
the identification of the computer with the controller and the 
radar and missile sites as the sensors and actuators, and the 
explicit statement of system goals with their relative importance. 

While stochastic control has doubtless been useful for certain 
problems, there has recently been an increase in interest in the 
more difficult problems of large scale engineering systems. 

These systems (Figure 1.1.2) are characterized by the presence of 
multiple controllers acting on different data bases and affecting 
different aspects of total system performance. Since classical 
stochastic control theory is restricted to systems with a single 
controller possessing perfect memory of all past sensor outputs 
and actuator inputs (the so called classical information pattern ) , 
the need for a generalized theory is apparent. Such a theory must 
subsume classical stochastic control, so that explicit optimal 
solutions to realistic design problems cannot be expected. But 
what can be accomplished is the establishment of a framework in which 
the information interface problems that arise in multiple controller 
systems can be viewed. 
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1 . 2 Background 

Tentative steps in the direction of a generalized theory of 
stochastic control have been taken by a number of workers. Inspira- 
tion has come from a number of fields other than classical stochastic 
control theory. These include the theories of games , statistical 
decisions, multilevel hierarchical systems, teams, and communications. 

The crucial issue in generalized stochastic control is the 
interaction between information and decision. This issue arises 
unavoidably in the Von Neuman game theory [Vl,Ll,0l] due to the 
presence of more than one player. Unfortunately, attention in game 
theory has focused on the so called normal form of the game. In 
this form the dynamical and informational aspects of the game are 
suppressed by introduction of the notion of strategy. A generalized 
stochastic control problem can be considered a non zero-sum game, 
and so it has a normal form. Of course, no insight is gained from 
this reduction. More useful for non-classical stochastic control 
is the work that game theorists have performed on the extensive form 
of the game [K1,K2 ,Dl,Tl] . It is here, for example, that the 
important notion of the information pattern arises. 

Another area in which the issue of the interaction between 
information and decision naturally arises is statistical decision 
theory. Statistical decision theory is a mathematical discipline that 
resulted from the infusion of ideas of game theory into the more 
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traditional statistical theory of Fisher, Neyman and Pearson, and their 
followers. The synthesis was largely performed by Abraham Wald, and 
culminated in his book Statistical Decision Functions [Wal] . Wald's 
formulation is still important, but an alternative formulation by the 
Bayesian statisticians has grown in popularity [Sal, Ral] . 

Statistical decision theory contains several ideas important in 
stochastic control. One example is the notion of a sufficient 
statistic . Another example is contained in Wald's treatment of the 
sequential problem. This treatment contains ideas of dual control 
and of dynamic programming. 

The theory of multilevel hierarchical systems is due to 
Mesarovic [Mel] , who drew inspiration from the study of decentralized 
structures in economics and management [Arl , Soil] and from large 
scale mathematical programming [Lai, Wisl]. However, Mesarovic 1 s 
model is deterministic and problems of information flow appear only 
implicitly. More recently Chong [Cl] has investigated a stochastic 
version of a two level, hierarchical system in which the interaction 
between information and control appears explicitly. 

Team theory [Ml,M2,Rl] is closely related to statistical 
decision theory. According to Radner [Rl] , team theory arose from 
"... attempts by several workers to analyze some of the many-person 
aspects of organizations that are present even in the absence of 
many-person game complications... 11 . Team theory is actually a 
special static case of non-classical stochastic control. It is 
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important since an explicit solution to the quadratic-Gaussian 
team problem is known, so that the relative efficacy of different 
information structures can be compared [R2], 

Communications theory is another area that is a special case of 
non-classical stochastic control. There are two controllers in a 
communication problem, the encoder and the decoder . By the very 
nature of the problem, the decoder does not know either the 
observation (source outputs) or controls (channel inputs) of the 
encoder. Information theory was invented by Shannon [Shi, Gal) to 
deal with problems of this nature. 

Although non-classical stochastic control theory has drawn 
inspiration from a number of cognate disciplines , it is undeniably 
a direct outgrowth of a critical look at the foundations of classical 
stochastic control performed by several authors. The fundamental 
theoretical tool in stochastic control is the dynamic programming 
algorithm [Bl,Fl,Aol ,Hl] . Although the algorithm can only be 
explicitly carried out in certain special cases, it nevertheless 
provides a convenient conceptual framework in which theoretical 
questions of existence, uniqueness, randomization, etc. can be posed 
and answered. The critical underlying assumption for the validity 
of dynamic programming is the classical information pattern: one 
controller with access to all past observations and controls 
[Chl,StlJ. Thus an examination of the foundations of dynamic 
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programming suggests the non-classical stochastic control problem 
as an extension* 

Explicit consideration of non-classical stochastic control 
theory began with the work of Witsenhausen [wl ,W2 ,W3 ,W4] . 
Witsenhausen gave an example of a linear-quadratic-Gaussian 
(LQG) stochastic control problem for which the optimal control laws 
are nonlinear in [Wl] . In [W2] , he examined the fundamental issue 
of when a general stochastic control problem (or game) is well- 
posed* In [W3 ] f the status of the new theory was surveyed, with 
the introduction of a useful system of notation and the listing 
of a number of "assertions" which might be turned into theorems 
by appropriate technical assumptions. In [W4] , a maximum 
principle (for control laws) was derived* 

Non-classical stochastic control has drawn the attention of 
other workers* Athans and a number of his students have 
investigated suboptimal solutions to certain non-classical 
problems [C2 ,Kwl /Cal] * Y*C, Ho and his student K.C. Chu have 
classified information patterns and identified some for which the 
optimal control laws for the LQG case are linear [Hol,Chul], Aoki 
has found a suboptimal solution for the control sharing information 
pattern [Ao2] * Bismut has given an example in which the 
interaction between information and control is clearly exhibited 
[Bil], and Sandell and Athans [Si] have used Bismut's idea to 
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explicitly characterize the optimal nonlinear solution of the 
control-sharing LQG stochastic control problem. 
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1,3 Summary of Thesis 

Research in non-classical stochastic control to date has been 
handicapped by the absence of a rich class of tractable examples. 

In classical stochastic control , the LQG (linear-quadratic-Gaussian) 
problems are a readily solvable class useful for motivation and 
for practical applications. Unfortunately, the solution of the 
non-classical LQG problem is difficult and unknown [Wl,Sl]. 

' The present work is aimed at easing this difficulty. Attention 
is restricted to the case of finite-state, finite-memory (FSFM) 
stochastic systems. For these problems, an elegant and elementary 
theory can be developed. The optimality conditions for these 
problems have a special structure that can be exploited to develop 
numerical optimization techniques. Evidence of the importance of 
the problem is given by the interest in a special case of the 
problem in the operations research literature [Howl f How2] . 

The FSFM model is introduced in Chapter II. It is demon- 
strated that a number of apparently more general problems can be 
reduced to FSFM stochastic control problems An example of a FSFM 
problem is given that illustrates the important notion of a 
signaling strategy . The chapter concludes with the derivation 
of a deterministic optimal control problem equivalent to the FSFM 
stochastic control problem. 
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A FSFM minimum principle is derived in Chapter 111, The 
minimum principle is a necessary condition for optimality, 
but is not sufficient in general as is shown by a simple 
example. However, in the absence of signaling strategies, . 
the minimum principle can be strengthened to give a sufficient 
condition. A numerical optimization algorithm, the Min-H 
algorithm , is developed based on the minimum principle. 

The dual dynamic programming functional equations for forward 
and backward induction are stated in Chapter IV. Several 
approaches to the numerical solution of these equations are 
suggested, and their implementation is illustrated by an example. 

Chapter V considers the infinite horizon version of the 
FSFM problem. The Value and Policy Iteration methods are derived 
for a version of the problem with discounted cost, and their 
numerical implementation discussed. Policy Iteration is illustra- 
ted by an example. This example has the interesting property that 
the optimal control law sequence is non- stationary . 

In Chapter VI, the problem of hypothesis testing of Bernoulli 
trials with a 1-bit memory is considered. Application of the 
minimum principle suggests a class of non-obvious, but intuitively 
desirable strategies. This result provides considerable 
justification for the use of control- theoretic methods in 
information theoretic problems. 
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Chapter VII consists of conclusions and suggestions for future 
research. 
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1.4 Contributions of Thesis 

The major contributions of this research are; 

(1) The formulation of the FSFM problem. 

(2) The minimum principle and the person-by-person 
min-H algorithm for the FSFM problem. 

(3) The relation of the information properties of the 
FSFM problem to the optimality conditions. 

(4) The extension of the Sondik algorithm to the FSFM 
problem. 

(5) Formulation of the infinite horizon FSFM problem with 
discounting. 

(6) Extension of Value and Policy Iteration methods to 
the FSFM problem. 

(7) Extension of Sondik 1 s implementation of Policy Iteration 
to FSFM problems. 

(8) Demonstration of the potential value of control- theoretic 
methods in information handling systems via the hypothesis 
testing problem. 



CHAPTER II 


THE FSPM STOCHASTIC CONTROL PROBLEM 

In this chapter, the finite-state , finite-memory stochastic 
control problem is introduced. It is shown that FSFM problems are 
a fairly general class of non-classical stochastic control problems. 

An example is given illustrating the interesting signaling strategies 
that occur in FSFM problems. The chapter concludes with the development 
of a determistic optimal control problem equivalent to the FSFM problem. 

2.1 Formulation 

The systems studied are described by the state equation 

x(t) = f t (x(t-l) ,u(t) ,q(t) ) (2.1.1) 

where x(t) e X for t * 0,1,2,...,T and u(t) e U , q(t) £ Q for 
t = 1,2,...,T. The finite sets X^, U^, and Q are referred to as the 
state set , the input set , and the uncertainty set , respectively. 
Associated with the state equation is a cost function 

T 

J = <}> (x(T)) + £ h (x(t-l) ,u(t)) (2.1.2) 

t=l c 

where h t : X^ x u t R and $ : + R (R = real numbers) . 

The interpretation of the equations is as follows. The state 
equation (2.1.1) models some controlled, uncertain physical process. 

The variables x(t) represent the possible states of the process, the 
variables u(t) are the inputs of the controller, and the variables q(t) 
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represent the stochastic effects present* The system's performance is 
measured by its cost of operation as expressed by (2.1.2). 

The designer's problem is to specify the system controller. 

The controller is specified by a sequence of control laws 

Y t : X t-1 + t - 1,2,...,T. (2.1.3) 

The interpretation of the control law is that when the process is in 
state x(t-l) , the controller applies input u(t) = Y t (x(t-1)). The fact 
that all control laws are not feasible (due to various physical 
constraints) is recognized by specifying the set of admissible control 
laws 

Vi 

r t C U t (2.1.4) 

at time t, t = 1,2,...,T. The designer is constrained to choosing 
Y = (Y 1 »***»Y t ) e T, where 

r = x r 2 x. ..x r . ( 2 . 1 . 5 ) 

An admissible control law sequence yeT will be called a design , and the 
set T will be referred to as the set of admissible designs . 

The design Y should be chosen so that the system operates with 
minimum cost. Notice, however, that the cost (2.1.2) of operation of 
the system is not determined solely by y, but depends on the (uncertain) 
values of x(0) , q(l) , ..., q(T) . The difficulty is resolved by adopting 
a Bayesian viewpoint: all the uncertain variables are assumed to be 

random variables with a known joint probability distribution. 
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In the FSFM model, it is assumed that probability functions 
tt : X -*■ (0,1] and p : Q -*• [0,1] are given. The probability space 

U U ^ u 

(ft, F, P) is then defined as follows. The sample space ft and field of 
events F are 

ft = X 0 x Q 1 x Q 2 x ... x Q t (2.1.6) 

F = P(ft) (2.1.7) 

where P(ft) is the power set of ft (set of all subsets of (2) . The probab- 
ility of a point W = ( V V V •' V £ ft is 

P({to}) = TT(x q ) P 1 (q 1 ) P 2 (q 2 ) (2.1.8) 

and the probability of an arbitrary event of F is the sum of the 
probabilities of its points. 

Given yeP/ the corresponding expected value of J can be computed 
in several ways. Define 

X = X Q x X 1 x. ..x X T , (2.1.9) 


U = U, X U„ x ... x U m . 
12 T 


( 2 . 1 . 10 ) 


The system of feedback equations 


x ( t ) = f fc (x(t-l), y fc (x(t-l) ) , q (t) ) , t = 1,2, ... ,T (2.1.11) 

has a unique solution (x(0) , x(l) , ..., x(T)) £ X for each (x(0) , q(l) , 
..., q(T)) £ ft. This is a trivial consequence of the casual nature of 
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the setup. 


1 


Thus , there exists an unique solution map 


S y : fi + X (2.1.12) 

that gives the sequence of states resulting from a given design and a 
given sequence of stochastic inputs. Defining 


X = P(X), (2.1.13) 

a probability space (X, X, ps y ^) is defined, where the probability 
PS y ^ is defined by 

PS y 1 (x) = P({w :x = S y (W)}). (2.1.14) 

Similarly , define the map R : + X x 0 by 


Ry = S y X (YoSy) 


(2.1.15) 


- 1 . 


The corresponding probability space is (X x U, X x U, PR ) , where 


Y 


PRy ^ is defined in a similar fashion to PS y . 


Recall that the cost function J is a map J : X x U + R. Let i : 

A 

X + X be the identity map, then maps J y : (2 **■ R and J y : X -*■ R can be 
defined by 


Jy = jo Ry, (2.1.16) 

J v - J o (i * Y) • ( 2 . 1 . 17 ) 

Y A 


The crucial importance of the concept of casuality in assuring the 
existance of solutions to feedback equations has been demonstrated 
(for quite different models) by Witsenhausen [W2] and Willems [Wil] . 
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Finally# the expected *value of J can be computed in the following 
three ways*: 


EJ y = f Q J y (0)) d P(CO), (2.X. 18} 

EyJ = / x x u J(x,u) dPR^ 1 (x,u) , (2.1.19) 

E J v = f v J (x) dPs" 1 (x). (2.1.20) 

Y Y * y Y 

But by theorem 39 .C of Halmos [Hal], 


EJ y = EyJ = E y J y £ J(y). 


( 2 . 1 . 21 ) 


Thus the FSFM stochastic control problem is to find min J(Y) # and the 

yer 

minimizing control law sequence y*. 

Since X and U are finite, it is clear that T is finite. Therefore, 

the cost functional J(y) can in principle be evaluated for each yeF, 

and the result tabulated. Since a finite set of real numbers always 

has a minimum, an optimal control law sequence exists, although it may 

not be unique. Moreover, since the minimum of a convex combination of 

a finite set of real numbers cannot be less than the smallest such 

2 

number, it is clear that randomized designs offer no advantage. 


For the finite spaces considered here, 

/ v f(x) d P(x) = E f (x) P(x) . 
x X 

Notice the notation EJ , E J , E J indicating the dependence of the 

y y y y 

function and/or probability measure on y. 


A randomized design is a sequence \^ f yeV, of numbers satisfying A > 0, 
Z A=l. If f! is the set of such numbers, J is extended to ft by the 

yer T 

definition J(A)= 
in game theory [Vl] . 


The use of randomized strategies is crucial 
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2.2 Generality of the Model 

The FSFM model is motivated by the control system described in 
section 1.1. Besides the obvious limitation of the finiteness assump- 
tions, there are several features of the general engineering control 
system of section 1.1 which are apparently not reflected in the FSFM 
model. The purpose of this section is to establish the generality 
of the FSFM model. It will be shown that the features of the general 
engineering control system can be incorporated in the FSFM model. This 
will be accomplished by reducing a set of apparently more general 
problems to the finite-state, finite-memory problem. 

First consider the case in which the control laws are allowed to 
depend on the state only through a noisy observation 

y(t) = g fc (x(t) , 0(t)) (2.2.1) 

where 8(t) e © , y(t) e Y , and 0 , Y are finite sets. The random 
variables 0(t) are such that {x(0) , 0(0), q(l), * . . , 0 (T-l) , q{T) } 
from a sequence of independent random variables* The problem is reduced 
to the preceeding by letting x be the new state set, x 0^ 
be the new uncertainty set and 


x(t) 


f t (x(t-l), u(t) , q ( t) ) 

y(t) 


g t (f fc (x(t-l) , U(t) , q(t))» 0 (t) ) 


be the new state equation* The set consists of maps Y t : x 

Y t _ x U t satisfying Y^x^y) - Y t < x 2 ,y) for a11 y e \-l 


and 
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V x 2 £ Vr 

As a second example, suppose there are m observation equations 


y i (t) - (x(t) # 0 i (t)) 


(2.2.3) 


where for i = 1, ...» m, y X (t) e Y 0 1 (t) e 0 x r and Y are 

finite sets. The random variables 0 1 (t) satisfy independence conditions 
similar to those of the variables of the first example. Moreover, 
suppose that 


U t = U t X x U 2 x ... x U t m 


(2.2.4) 


and that u X (t) is to be chosen on the basis of observation of y 1 (t-l) 
alone. This is a case of the dynamic team [M2]. The reduction to 
the FSFM problem is accomplished by a state augmentation similar to 
that of (2.2.4). In this case. 


r t = r t 1 x r t 2 x . . . x r t m 


(2.2.5) 


where T 1 * consists of maps from x Y ^ x ... x Y™ to 11 ^ that 
t+1 t t t t+1 

depend only on the variable y 1 (t) e Y^ 1 . 

For a third example, consider the case in which the control laws 
are restricted to dependence on a finite memory set M . The state 
space for this problem is X t x M^_, the control space is U x M^, and 
the state equation is 


x(t) 


"f t (x(t-l) , u(t) , q (t) )" 

m(t) 


v (t) 


( 2 . 2 . 6 ) 
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where v(t) £ M t * The control laws Y t : \-i x U t X M t are of the 

form Y t « (Y t / n t > , where Y t : x ^ and ^ : X t _ 1 x M t _ 1 + 

M is the memory update function . 

For a fourth example, suppose that there are two control stations. 
Control station 1 can communicate with control station 2 through a 
channel described by the equation 

r 2 (t) = w t 2 (s^t), e 12 (t) ) (2.2.7) 

11 2 2 
where s (t) £ is the signal sent by control station 1, r (t) £ R t 

12 12 . 

is the signal received by control station 2, and £ (t) £ is a 

2 1 12 

noise process. It is assumed that R , S , and E are finite sets, 

12 T 

and that the random variables of the sequence {e (t) } ^ are in- 

dependent of each other and all other random variables of the system. 

This situation is handled by adding (2.2.7) to the state equations, 

letting the control space of control station 1 be x S \ and h>Y 

2 2 

letting the observation space of control station 2 be x . The 

„ ~ 1 1 

control laws of control station 1 are the form y : X^_^ x ** 

U 1 x M t 1 x where = (Y * O^) . Here C ^ : X t _ 1 x + 

is the encoder of control station 1. 

As a final example, suppose that the cost function is of the form 

J = <|> T (x{0) , x(T) ) . (2.2.8) 

(This formulation is important when communication or statistical decision 
problems are considered as FSFM problems.) This situation is handled 
by redefining the state space to be x and adding an equation of 
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the form 


2 (t) = z(t-l), t = 1 ,2 , . . . ,T, (2.2.9) 

to the state equations, where z(0) = x(0) . 

It should be clear at this point that most of the important features 
of the general engineering control system of section 1.1 have been 
captured by the F5FM model. It is worth emphasizing that the memory 
management and communication handling tasks of the control stations can 
be incorporated into the FSFM problem. Thus the crucial data processing 
problems of systems with multiple controllers can be examined on an 
equal footing with the choice of actuator inputs. 
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2.3 Other Formulations 

X 

The case r = t_ of the FSFM stochastic control problem is 
the case of complete state information and has been extensively studied, 
principally in the operations research literature [B1 , Howl, How2, Hi, 
Kul]. The problem is usually referred to as a Markovian decision 
process , and the formulation is slightly different. The state is not 
defined by a state equation of the form (2. ]_.!), but is instead defined 
as a controlled Mahkov chain with transition probability p^ U (t) . 

This is the probability of a transition from state i to state j at 
time t when input u is applied to the system. Of course, (2.2.1) defines 
a Markov chain with transition probabilities 

p. , U (t) = p { {q : j = f (i,u,q) }) . (2.3.1) 

2.J Zr t 

Since it is not difficult (in the finite state case) to realize a 
given controlled Markov chain by a state equation , the two formulations 
are in fact equivalent . 1 

An extension of the preceeding problem is the case of incomplete 

state information treated extensively in both the control and operations 

2 

research literature [Asl, HI, Dyl, Stl, Aol, Sol, Sml] . This problem 

is also (for the finite-state, finite-horizon case) equivalent to a FSFM 

3 

stochastic control problem . The incomplete state information is 

Establishing the equivalence of the two formulations for the case of 
continuous state space is more difficult and (to the author's knowledge) 
an unresolved problem. 

2 

Control theorists have concentrated on the continuous state space case. 
The treatment is usually quite formal; certain conditional probability 
densities which may or may not be well defined are used extensively. 

3 , 

The in finite -horizon version of the problem cannot be conveniently 
handled by the FSFM techniques. 
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described by observations of the form (2.2.1) that can be adjoined to 
the state equation as in (2.2.2). Moreover, all previous observations 
and controls are remembered. Therefore, the memory set is 

M t = Y 0 X U i xY i x * • • x u t-i x Y t 1 ' the memory u P date functions 

are constrained to sequentially storing the observations and controls 
as they occur. Although the incomplete state information problem is 
a special case of the general FSFM problem, the powerful perfect 
memory assumption allows special techniques to be used that do not 
apply to more general FSFM problems. These special techniques will be 
discussed in more detail later. 

The case T=1 of the FSFM problem includes both the non-sequential 

Bayesian statistical decision problem [Sal, Ral] and the team decision 

problem [Ml, M2, Rl , R2] (for finite sets). The sequential Bayesian 

problem (with perfect memory) is actually a special case of the 

Markovian decision problem with incomplete state information and is 

therefore a FSFM model. A sequential problem (hypothesis testing) 

with a 1-bit (hence imperfect) memory is treated in Chapter 6. 

Witsenhausen has given several stochastic control models that are 

slightly less general than the FSFM model when restricted to finite sets 

[W3, W4] . Witsenhausen shows that any sequential stochastic control 

problem can be reduced to a certain standard form . The FSFM model is 

a sequential stochastic control problem if the sets V satisfy the 

x ^ 

condition T = {y e u ; y ~ (IM Cl? } for t=l,2,...,T, where 

t L U L L U“'J. 

U t = p ( u t ) and is a subfield of = P(X t l ). in this case, the FSFM 

model is said to have a simple information constraint . Thus the FSFM 
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problem is more general than the sequential stochastic control problem 
since the most general constraint on the control laws is assumed • 

Even if a stochastic control problem has a simple information 
constraint, it may be preferable to reduce the problem to a FSFM model 
rather than to Witsenhausen standard form* As Witenhausen says, 

" t . , alternative reductions leading to standard models with simpler 
state spaces may be possible in specific cases 1 ' [W4] - For problems 
with stochastic inputs which are independent from one time to the next, 
reduction to the FSFM model rather than to the standard model results 
in a simpler state space , but a more complicated state equation- It 
may be possible to formulate a FSFM problem with a fixed finite state 
set while the corresponding standard model requires a growing state set. 
This is an important computational advantage in general/ and a crucial 
advantage when the infinite horizon problem is considered- In fact/ 
the motivation for the development of the FSFM model was the development 
of a special class of Witsenhausen-type models for which an infinite 
horizon problem could be formulated - 

Games in extensive form are a class of problems more general than 
FSFM problems- The original formulation due to Von Neuman and Morgens tern 
[VI] was improved upon by Kuhn [Kl r K2] and subsequently by Aumann [Aul] 
and Witsenhausen [W2]- The theory of extensive games is more general 
than stochastic control theory in two significant ways- First, there 
are in general N players, each with a different cost function. Second, 
the theory of extensive games (in the Kuhn and Witsenhausen formulations) 
does not require that the time order in which the various decision 
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variables are selected is fixed in advance. The fact that there is 
more than one cost function is the essential complication of game 
theory as opposed to control theory. However, as Witsenhausen [W2] 
has pointed out, the non-sequential ordering of decision variables in 
extensive game theory is also perfectly appropriate in the context of 
control theory. However, aside from Witsenhausen 1 s causality 
condition for well-posedness [W2] r esentially nothing is known about 
non-sequential stochastic control problems. 

The FSFM model is related particularly closely to the Kuhn model 
of an extensive game. According to Kuhn^ an extensive game is game 
tree with 

(i) a partition of the vertices with alternatives into the 

chance moves P Q and player moves . . . , 

(ii) a partition of the moves of P^ into information sets 

(iii) a probability distribution on the alternatives of the 
information sets of 

(iv) an n- tuple of real numbers for each terminal vertex. 

An example of Kuhn-type extensive game is shown in Figure 2.3.1. 

There is one chance move in with four alternatives. Each alternative 
consists of the choice of an outcome of tossing two pennies. Thus 
each outcome occurs with probability —. There are four moves in , 
and player one’s information set is equal to P . Thus player one does 
not know the outcome of the first chance move. He has to guess if the 
pennies match or don’t match* If he guesses correctly, he gets to keep 

1 See [K2] for a complete exposition. 
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(+ 1 ,- 1 ) 

(-1, + t) 
( + 1 ,- 1 ) 

(-I. + 1) 

(+ 1 ,- 1 ) 

(+ 1 ,- 1 ) 

(-I.+1) 



Figure 2.3*1 Matching Pennies 
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his own penny and player two's penny (the payoff is (+1, -1)). If 
he guesses incorrectly, he loses his penny to player two (the payoff is 
(- 1 , + 1 >>. 

Every FSFM problem can be reduced to a Kuhn extensive game. It might 
be thought that the reduction is accomplished by identifying the player's 
alternatives with the controller’s inputs, but this is not always 
possible . Suppose, for example, that X Q = {1,2}, = {0,1}, and 

= {y^ Y x }, where Y 1 (l) - 1, Y 1 (2) = 0 and y = 1-y.^ 1 Clearly, 
the game tree for this problem must have its first seven nodes as 
in Figure 2.3.2, with vertices 1 and 2 in the set of moves of 
player one (the only player). However, it is not possible to partition 
P^ into information sets so that the restriction that the same alternative 
must be chosen for each vertex in a given information set is equivalent 
to the restriction that the control law must lie in 1^. The point is 
that restricting the control laws to lie in an arbitrary subset of 
U is a more general restriction than one based on information. 

Thus, it is in general necessary to identify the player 1 s alternatives 
with the set of control laws. This is undesirable since the game does 
not exhibit the information properties of the FSFM problem. However, 
it will be shown in Chapter 3 that the first reduction (identifying 
alternatives with controller inputs) is possible for FSFM problems with 
simple information constraint. 

The choice of seems unnatural, but has appeared in the literature 

[Stal]. The control laws in are the closed- loop control laws; those 

x 0 

in - r are the open- loop control laws. 
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Figure 2.3.2 Game Tree for FSFM Problem 
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2.4 Example 

In this section, an example of FSFM stochastic control problem is 
given. The problem is sufficiently simple that a solution can be 
written down by inspection. However it does illustrate the signaling 
strategy , a key phenomenon that occurs only in non-classical (as opposed 
to classical) stochastic control problems. 

Figure 2.4.1 illustrates the problem considered. The initial state 
x(0) is random, with P(x(0)=l) = P(x(0)=2) = -j. The objective is to 
choose the controls u(l) , u(2) so that x(0) = x(2) . If x(0) ? x(2) , there 
is a penalty of 1 unit, and there is an additional penalty of k > 0 
units if x(l) = 3. The control u(l) at time 1 is allowed to depend 
on x (0) . 

If the problem is to be a classical stochastic control problem, the 
control at t=2 must be allowed to depend on x(0) and u(l) . In this case 
the solution is trivial* For t=l, always choose u=l. For t=2, choose 
u=l if x-1 and u=0 if x=2. The resulting expected cost is EJ-0. 

Suppose on the other hand that the control u(2) is allowed to depend 
on the event x(2)=3 only* Then, if k < 1, an optimal strategy at t^l 
is to choose u(l)=l if x(0)=l and u(l)=0 if x(0)=2. The corresponding 
optimal strategy for t=2 is to choose u(2)-0 if x(2)=3 and u(2)=l if 
x(2)=l or x(2) =2 . The expected cost is EJ=“k. 

The strategy employed in the choice of the first control for the 
second case is referred to as a signaling strategy * The interpretation of 
this statement is the following. If x(0)=2, the first controller moves 
the state to x(l)=3, which is undesirable for control purposes (there is a 
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penalty k < 1) . However, the second controller is able to see that x(l)=3, 
and so he unfailingly knows that the first state was x(0)-2. He can then 
avoid the penalty for being in the wrong terminal state. 

The terminology signaling strategy arises in the theory of extensive 
games [Tl]. If the present example is viewed as a (1-player) extensive 
game, it has a Kuhn game tree [Kl,K2] as shown in Figure 2.4.2. Notice 
that the states have been eliminated, and only the sequence of decisions 
exhibited (the choice x Q =l or x^=2 is a decision due to nature) . Note 
that the vertices of the game tree are partitioned into information sets. 
Thus in Figure 2.4.2b, the second decision must be made on the basis 
only of the knowledge that the event x(0) = 2 and u(l)=0 did or did not 
occur. Thus the control law must pick out the same alternative for 
each vertex within a given information set. 

In terms of the game tree, the notion of a signaling strategy can be 
given a precise definition. Consider the information set in 
Figure 2.4.2b. The set of all vertices following the choice u=0 does 
not contain the set , so according to Thompson's definition [Tl] , is 

a signaling information set, and any strategy (control law) defined on 
is a siqnalinq strategy. In constrast, the set of all vertices following 
the choice u=0 for in Figure 2.4.2a contains Vy and a similar 
statement holds for V^. Thus JJ in Figure 2.4.2a is not a signaling 
information set. This situation may be summed up succinctly as follows. 

In V 4 and V^, the player (controller) remembers everything he knew in 
^2 2.4.2a). In , the player has forgotten nature's choice and 

his own previous decision. 




Figure 2,4.2 (a) Perfect State Observation (b) Imperfect State Observation 
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In the next chapter, a minimum principle is established for the FSFM 
stochastic control problem, and it is verified that the optimal strategies 
satisfy the minimum principle. The importance of the concept of the 
signaling strategy is that when there are no signaling strategies present, 
the minimum principle can be strengthened to give a sufficient condition. 
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2.5 The Equivalent Deterministic Problem 

In this section, a deterministic optimal control problem equivalent 
to the FSFM problem is derived. The equivalent problem can be used to 
obtain necessary and sufficient conditions for the optimality of sequence 
y* of control laws for the FSFM problem. 

Since the FSFM stochastic control problem with simple information 
constraint is a special case of the general sequential stochastic 
control problem, it could be transformed to Witsenhausen' s standard form 
and the general optimality conditions applied [N4]. However, the FSFM 
problem has a special structure that can be usefully exploited in the 
development of optimality conditions. These conditions are expressed in 
terms of the equivalent deterministic problem derived in this section. 

The deterministic problem for certain important special cases of the FSFM 
problem has a state space of fixed, finite dimension in contrast to the 
growing state space required in general. Moreover, the assumption of a 
simple information constraint is unnecessary. 

It should not be inferred from the preceding remarks that the 
equivalent deterministic problem derived in this section has the most 
efficient state space for all stochastic control problems that can be 
cast in the FSFM format. In fact, for perfect memory problems, and for 
certain sequential hypothesis testing problems, more efficient equivalent 
deterministic problems can be derived utilizing the special structure of 
these problems. 

The state space of the deterministic problem equivalent to the FSFM 
is the set of probability vectors on the original state set X fc . Since 
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X is finite, there is no loss in generality in assuming that X^= {1,2 , * . * ,n^}+ 

* n t n t 

Let II t be the set of probability (row) vectors in R , ie. , 1^ = 1 


i=l 


and IT ^ 0, i = 1,2,... ,n t » 


Y* 


t t 

For Y t € r , let h (t) be the (column) vector in R with components 

h t (i, Y t (i)), i = l,2,...,n . Similarly, let <t> T be the column vector 
n t 

in R with components (^(i), i=l,2, . . . ,n T . 

Y t 

Finally, for each Y t £ r , define matrices P (t) with components 


P ij t(t) =P t ({q : j = f t (i/ Y t (i) ' q)}) 

Y+. 


(2.5.1) 


where i e x t-l 311(3 ^ £ X . Clearly, P (t) is a stochastic matrix 

(its rows sum to one, and its elements are non-negative) . Notice that the 

Y t u t 

matrices P (t) , Y t £ can be determined by the matrices P (t) , 

u^ £ U t , where P (t) is the stochastic matrix with components 


■ij = P t ({q 1 j = f t (i ' V q)}) 


(2,5.2) 


Y+ 


for i £ X t _ 1# j £ If y (i) = u fc , then row i of P (t) is equal to 

row i of p u t (t) a 

Let 7T (0) = TT , and define TT (t) by the equations 


\ 

Tr(t) = *rr(t“l) P (t) 


(2.5.3) 


for t - 1,2,... r T. Clearly, ^(t) corresponds to the marginal probability 
measure of PS^ ^ on X^. That is, TT^(t) is the unconditional probability 
that x(t) = i when the control law sequence Y = (Y^, . y t ) is used. 

It follows immediately that 
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T y 

JOf) =E V J =tt(T) 4>(T) + Z Tr(t-l) *i(t). (2.5.4) 

' T t=l 

Therefore, the FSFM stochastic control problem is equivalent to the 
deterministic problem of minimizing (2.5.4) subject to (2.5.3). 

Further insight into the nature of the equivalent deterministic 
problem (2.5.3), (2.5.4) can be obtained by considering randomized 
strategies. Attention is restricted to the class of behavioral strategies 
[K2]. This is a subclass of the general class of randomized strategies 
defined in Section 2.1. 

A behavioral randomization is a set of non-negative numbers {A (t) } 

^t 

satisfying 


E X (t) = 1 (2.5.5) 

v r t ‘ 

for t = 1,2,...,T. In this case, the control law y fc e T t is chosen with 

probability X (t) independently of the choice of y , t ^ T. Notice 
Y t T 

that it is not possible to coordinate the choice of strategies over time 

(unless the strategy at every stage is pure^) so that behavioral 

randomization is not the most general randomization. 

In terms of the behavioral strategy, the state equation (2.5.3) becomes 


■n-(t) 


TT(t-l) ( E A <t)P ^ (t) ) , t = 1,2 


V r t 


,T, (2.5.6) 


where ir (0) — The cost function is 


The behavioral strategy is pure if X (t) = 1 for some Y e T , 

Y t t t' 


t 1^ 2/ • « « f 


T. 
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J (y) = tt (T) ij) + I TT(t-l) / 2 X (t) h fc (t) j . (2.5.7) 

t-l VC t eT t Y t ) 

Equations (2.5.6) - (2.5.7) show that the FSFM problem is equivalent to a 
deterministic optimal control problem with bilinear state dynamics and 
bilinear cost functional. Moreover , since the optimal strategy is known 
to be pure (as pointed out in Section 2.1) , the problem is known 
a priori to be "bang-bang" . The fact that the FSFM problem is equivalent 
to a bilinear problem is intriguing since there has been a considerable 
amount of research devoted to these systems recently [Brl, Mol, Wil] . 
However, this equivalence will not be exploited in the sequel. 

In general, the FSFM model is an efficient representation of a given 
stochastic control problem when the state set of the FSFM problem is a 
fixed, finite set not too much larger than the original state set. This 
will generally be the case when the controller has a fixed, finite memory, 
the noise is independent from stage- to- stage , and the cost has a stage- 
wise additive structure. For problems of this type, the equivalent 
deterministic problem has a state space of fixed, finite dimension, in 
contrast to the growing state space required by the Witsenhausen 
standard form. This simplification is achieved by admitting slight 
complications into the structure of the deterministic problem. Thus the 
matrices corresponding to the P (t) are stochastic matrices with all 
elements either zero or one and only a terminal cost is required for the 

deterministic problem equivalent to the Witsenhausen standard form. 

When the controller has perfect memory, its memory set expands and 
so must its state set. Thus the deterministic version of the corresponding 



FSFM problem requires a growing state space. A more efficient equivalent 
deterministic problem is obtained by taking the conditional probability 
vector of the original state set given past observations as the 
deterministic state. This approach has been followed , implicitly or 
explicitly, by a number of authors [Aol, Asl, Sol, Sml, Sawl]. 



CHAPTER III 


THE FSFM MINIMUM PRINCIPLE 

In this chapter, a minimum principle is stated and derived. The 
minimum principle is a necessary condition for optimality , but is not 
sufficient in general. However, in the absence of signaling control laws , 
the minimum principle can be strengthened to obtain a sufficient condition. 

A numerical optimization algorithm based on the minimum principle 
is developed. It is shown that the algorithm always converges to a 
person-by-person extremal . 

3. 1 Derivation of the FSFM Minimum Principle 

In the previous section, it was shown that the FSFM stochastic control 
problem is equivalent to a deterministic optimal control problem with 
cost functional 

T Y. 

J(y) = 7T(T) <(> + Z Tf(t-l) h (t) (3.1.1) 

t=l 

and state equations 

Y t 

TT(t) = TT(t-l) P (t) , t = 1, 2, T (3.1.2) 

where tt ( 0 ) = TT^ is given. Notice that each I* is a discrete set, so that 
the convexity assumption required for application of the discrete 
minimum principle [Hall, Holl] is not satisfied. Therefore, the proof 
presented in this section proceeds from first principles . 
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Since the state dynamics of the equivalent deterministic problem are 
linear in the state, it is useful to consider the adjoint system of 
(3.1.2). Let $ be the set of (column) vectors <j> e R n . Then the product 

TTd) = Z TT A (3.1.3) 

X X 

x£X 

is defined in accord with the usual matrix- vector notation. Holding $ 
fixed, a linear functional on II is defined, and conversely. 

Define the forced adjoint or costate equation 
Y Y 

<j>(t-l) - P t (t) <J>(t) + h fc (t) (3.1.4) 

for t = 1, 2, . .., T, where <J>(T) = 4^ is the terminal cost vector. 

Lemma 3.1.1 

Let Y = (Y^# Y 2 * Y t ) be fixed control law sequence. Let the 

corresponding state and costate sequences be defined by (3.1.2) and 
(3.1.4) where ir(0) = tt q and <}>(T) « (j^. Then, 

T Y t 

TT(t) 4>(t) = Z TT(T-l) h (T) + ’TT(T) <*>(T) . (3.1.5) 

T»t+1 

Proof 

The proof is by backward induction. Equation (3.1.5) is clearly 
valid for t - T. If it is valid for general t, then 

Y Y 

TT(t-l) 4>(t-l) = TT(t-l) (P fc (t) <J)(t) + h fc (t)) 

Y t 

= 7T(t) <J)(t) + TT(t-l) h (t) 

T y t 

■ £ TT(T-l) h (T) + TT(T) <j)(T) 

T=t 


(3.1.6) 
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so that (3.1.5) is valid for t-l. Therefore, the equation is valid for 
t ~ T/ T 1, • • • f 1. 


Theorem 3.1.2 (FSFM Minimum Principle) 

If the sequence Y° - (Y^# \ ‘ is °P timal for the FSFM 

stochastic control problem and tt° (t) , <fP (t) are the associated state and 
costate sequences satisfying 


Y 

•n-°(t) = TT°(t-l) P t (t), tt° (0 ) = TT n 


(3.1.7) 


, 0 0 
Y Y 

4>° (t-l) = P t (t) cj>°(t) + h t (t) , <f°(T) = A 


(3.1.8) 


then 


Y 


o too 

7T (t-l) P (t) (f) (t) + 7i (t-l) h (t) 


Y Y 

<7r°(t-l) P t (t) 4>°(t) + -rr°(t-l) h t (t) 


(3.1.9) 


for all 


Y t E r fc , for all t = 1, 2, . , T. 


Proof 


From Lemma 3.1.1 and equation (3.1.1) , 


J( Y 0 y 0 0 0 0 

tY i ' Y t-1 ' Y t ' Y t+1 ' ***' Y t ’ 


t-l 


= ^ 7T 0 ( X -1) h T (t) + ir° (t-l) (t— 1) 

T=1 


t-l 


(3.1.10) 

0 


= Z A T-l) h T (T) + 7T°(t-l) P * (t) Cj) 0 (t) + TlV-l) h * (t) 

T=1 
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Similarly, 


J <\° Vi 0 ' V Vi 0 \ 0> 


(3.1.11) 


t— 1 Y v Y Y 

= X tP (t- 1) h T (T) + TT°(t-l) p fc (t) <fp(t) + tP (t— 1) h fc (t). 
T=1 

Notice that it(t) is independent of y^, T < t, and <|>(t} is independent of 
Y fc , T > t. 

0 0 0 . . 

Since y 1 , Y 2 , Y t is optimal, 


T , 0 0 0 . 0 0 . 
J(Y X , ...» Y fc _ 1 , Y t , Y t+1 » •••' Y T > 


< J(Y 0 Y°YY° Y°) 

^ 1 9 f # ** r ' T * 


( 3 * 1 . 12 ) 


and (3.1.9) follows immediately . 

Although the minimum principle is a necessary condition for optimality, 
it is not a general sufficient. This hardly is surprising, since only the 
condition (3.1.12) of the optimal control law sequence has been utilized. 
Other control law sequences than the optimal can satisfy (3.1.12). Such 
sequences are called extremal . Thus Theorem 3.1.2 has the key ingredients 
of a minimum principle. The Hamiltonian minimization is global since 
every Y t e r must be tested. However the overall minimization of the 
cost functional is local, since the test is performed for a single, 
isolated time instant. This is completely analogous to the continuous 
time situation in which large variations in the control for infinitesimal 
time intervals (the "strong variations" of the calculus of variations) are 
used to derive the minimum principle [Pol, Al] . 
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3.2 Examples 

In this section, two examples illustrating the application of the 
minimum principle are given . 


Example 1 

This example shows that the minimum principle is not in general a 
sufficient condition for optimality. The example has two stages (T=2) 
and is defined as follows: 



{ 1 , 2 } 


Q 1 = {l} ' Q 2 
*0 

*2 

h\t, 


{ 0 , 1 } 
{1,2,3} 
[1 0 ] 

1 

. 0 . 

0 


The sets of admissible control laws T. = T have iust two elements - 

1 2 

the control law whose value is always 1 and the control law whose value 

is always 2, The probabilities of the elements of O are 

2 

p 2 (l) =}, p 2 (2) p 2 (3) 

The state transition functions f, : x U, x Q n -*■ X, and 

1 0 1 *1 1 

f j t x x X 2 are defined by 
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(1,0,1) = 1, 

f x (2,0,1) - 2, 

^(1,1,1) 

= 2 

f 2 (l,0,l) = 1, 

f 2 (1,0,2) = 2, 

f 2 (1,0,3) 

= 2 

f 2 (2,0,l) « 1, 

f 2 (2,0,2) = 1, 

f 2 (2,0,3) 

« 2 

f 2 (1,1,1) = 1. 

f 2 (1,1,2) = 1, 

f 2 ( 1 , 1 , 3) 

= 2 

f 2 (2,1,1) = 2, 

f 2 (2,l,2) = 2, 

f 2 (2,1,3) 

- 2 


= 1 


It is not hard to verify that the corresponding transition matrices 
at t=l are 


P°U) = 


and at t=2 are 


0 


1 0 
0 1 


P 1 (l) = 


0 1 


1 0 


i i 


'3 1 " 

2 2 

1 

4 4 


P (2) = 


3 1 


0 1 

_ 4 4 _ 




P (2) - 


Suppose that y^* = 0, and that y 2 * = 0- It is necessary to compute 

TT* (1) and 0*(1) in order to apply the minimum principle. These are easily 

found: 


U*<1) = [1 0] 


4>*(1) - 


1_ 

2 


Li J 


The cost is Y 2 *) = y. Note that 


IT* (0) P°<1) <fl*(l) =“< TT*(0) P 1 (l) <J>*(1) 


1T*(1) P°{2) <J>*(2) = J< 7T* (1) p 1 (2) 4>*<2) 


3 

4 

_3 

4 
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so that the necessary conditions of the minimum principle are satisfied. 
However, if Y, =1 and Y 2 = 1» then JfYjr Y 2 ) = 0 so that (Y L *» Y 2 *) 
is not optimal. 


Example 2 

This example shows that the optimal control laws determined in 
section 2.4 for the example considered there satisfy the minimum principle. 

The problem as formulated in section 2.4 has state sets X Q = {1,2}, 

X^ = {l,2,3}, and = {l,2}. The control sets are = (0,l} f and 

the uncertainty sets are Q 1 = Q 2 = tl)* The transition functions are 
illustrated in Figure 2.4.1. The cost function is 


where 


J = h 2 (x(l)) + g(x(0) , x(2) ) 


h 2 (x(l)) 


k x(l) =3 
0 x(l)/3 


(3.2.1) 


1 0 x (0) = x(2) 

. 

1 x(0) ^ x(2) 

Since the cost function does not have the stagewise additive form 
(2.1.2), it is necessary to augment the state to put the problem into 
the FSFM formulation. The idea is to carry along x(0) in the state 
equations so that the term g(x(0) , x{2)) can be written in terms of the 
terminal value of the augmented state. 

When new state sets X Q = {l,2>, X 1 = (l,2,3,4>, X 2 = {l,2,3,4} are 
defined, the state transition diagram of Figure 3.2.1 results. Clearly, 



new state 2 


new state 1 




I 

LTt 

U> 

\ 


Figure 3.2*1 State Transition Diagram 
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( 3 . 2 . 2 ) 


can be written down by inspection 
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P (1) = 


0 10 0 
0 10 0 
0 0 0 1 
0 0 0 1 


T t 

(As pointed out in section 2.5, the matrices p (t) can be found if the 


u. 


matrices p (t) are known.) 

\_1 

For the case r = u , the optimal control laws found in section 2.4 

are y *(1> = 1, Y 1 *{2) = 1, and Y 2 *<1) = 1/ Y 2 *<2) = °' V <3) = °' 

Y 2 *(4) = 0. The corresponding it*( 1) and <(>*(1) are 


"* (11 - [i i] 


10 0 0 
0 0 10 


- [l 0 T o] 


4»*(i) = 


- - 


- ■ 


r - 


.. 

10 0 0 


0 


0 


0 

0 10 0 


1 


0 


1 




+ 


— 


0 0 0 1 


1 


0 


0 

0 0 0 1 


0 


k 


k 



L — 


_ _ 


_ - 


Therefore, 


V 


tt*(0) p (l) 4>*(D = o , 


Y * 

TT* Cl) p 2 (2) $*(2) = 0 


Since all numbers in the problem are non-negative, y* = ( Y ^* • Y 2 *) 

clearly satisfies the conditions of the minimum principle. 

X X 

For the case ^ = {y 2 e u 2 ' Y 2 (D = Y 2 < 2 ) - Y 2 (3)}» the 

optimal control laws are - 1» Y^*(2) • Of and y 2 *(1) = Y 2 *^) = 

Y 2 *(3) - 1,Y 2 *(4) =0. The corresponding tt* ( 1) and <j>*(l) are 
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TT»{1) 


-[* *] 


10 0 0 
0 0 0 1 




<Hl> = 


r H 


r H 


r- -I 


r 

10 0 0 


0 


0 


0 

10 0 0 


1 


0 


0 




+ 


— 


0 0 10 


1 


0 


1 

0 0 0 1 


0 


k 


k 

- — 


- — 


■ - 




Therefore, 


V 


TT*(0) P (1) <jf*(l) 

Y * 

Tl*(l) P 2 (2) <}(*(2) 




= 0 


There are three other possible control laws at t=l, and at t=2. These 

Y Y 

given* (0) p 1 (1) <j>*(l) = J k, j, j and n*(l) P 2 (2) «j>*(2) = 0, j, j. 

Therefore, the minimum principle is satisfied for k < 1. 

Note, however, that the control law sequence y - (y , y ) , where y = 1, 

* o -L 

y 2 = 1 also satisfies the minimum principle. Since the control law 
sequence Y has J^, y 2 > = y > JCy^, Y 2 *> - J k (for k < 1) , y is not 
optimal* This is a good illustration of the fact that satisfaction of the 
minimum principle assures only that the control law sequence can not be 
improved by changing the control law at a single stage. The optimal 
strategy f is a signaling strategy so that coordination is required: 
it is no use to employ the signaling control law unless the second 
stage control law utilizes the information. Conversely, a second stage 
control law that attempts to utilize signaling information that is not 
forthcoming is worthless. The need to consider signaling strategies is 
the fundamental reason why the study of non-classical stochastic control 
is much more difficult than the study of classical stochastic control* 
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3.3 Signaling and Sufficiency 

The novelty of non-classical stochastic control is the presence of 
signaling strategies. To explore the implications of this fact, it is 
necessary to restrict attention to a certain subclass of FSFM problems . 

Definition 3.3.1 

The FSFM problem (2. 1.1) -(2. 1.2) is said to have a simple information 
constraint if 

r t ■ (Y t eu t ! ' t ‘ 1 ! y t' 1<l V c F t-1 } (3 - 3 - 11 

for t = 1,2, ...,T, where U fc = P(U fc ) and is a subfield of X tl = 

The reason for restricting attention to FSFM problems with simple 
information constraints is that these problems can be readily identified 
with a corresponding Kuhn model of an extensive game (see section 2.3 
and reference [K2]). 

Suppose that a FSFM problem with simple information constraint is 

given. Let the sets X Q , Q^, U^, Q 2 ' . U T have n^, n^, m^, n^, . m^ 

elements, respectively. The rank 0 move 1 of the corresponding game 

tree has n alternatives. For 1 < t < T, the rank 2t-l move has n 
0 t 

alternatives and the rank 2t move has m^ alternatives. Thus every play 
has rank 2T + 1 (Figure 3.3.1). 

1 A move is a vertex of the game tree with alternatives; a play is a 
(terminal) vertex without alternatives. The rank of a move or play is 
the number of moves that preceed it. See Kuhn [K2] for details. 
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choice choice choice choice choice 

ofx(O) ofq(1) of u(1) ofq(2) ofu(T) 

*iii ; 


move 



alternatives 


Figure 3.3.1 Game Tree for FSFM Problem With 
Single Information Constraint 
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The chance moves P Q are the moves with rank 0, 1, 3, . 

2T-1, and the moves P^ of player 1 (the only player) are the moves 

with rank 2, 4, . .., 2T. Each alternative of the initial (rank 0) 

move of the game tree corresponds to an element of X Q . Similarly, the 

alternatives of moves with rank 2t-l correspond to elements of Q^, and 

moves with rank 2t correspond to elements of U t . 

Each information subset of P Q contains a single point of P Q . The 

information sets of P, are defined by the atoms* of F as follows* Notice 

1 t 

that the system equations (2.1*1) define a map 

S. : X rt xfi. X U 1 x . . . x Q.x U. -► X. ' (3.3.2) 

toil t t t 

which takes an initial state and a sequence of inputs and gives 
corresponding state. Each atom F of F defines a set 

{(x(0), q(l ) f u(l) , q(t) / u(t)): (x(0) , q(l) , u(l) , 

q(t) , u(t)) e F> C X 0 XQ 1 XU 1 X... xQ t xU t . (3.3.3) 

Since there is a one-to-one correspondence between the set x x x 
... x Q t x U and the moves of order 2t + 1 of the game, the partition 

induced on X fl x Q x x ^ x IM x x by the atoms of F^ induces a 

partition on the corresponding set of moves. Thus each atom F £ F^ gives 
rise to a single information set for player one containing moves of player 
1. As a consequence, all the moves of given information set are of the 

*An atom of a field F is a set F e F such that if E £ F and EC F, then 

either E = (j) or E = F, The atoms of a finite field always exist and form 

a partition [Hal]. 
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same rank. This is not surprising, since the problem is sequential [W2]. 

To finish the specification of the game, the probabilities of the 
chance moves must be defined and the terminal cost specified. If an 
information set of P Q contains a move of rank 2t-l, its alternative 
corresponding to q £ Q is chosen with probability p (q) . The terminal 
cost is determined by the fact that the plays are in one-to-one 
correspondence with X Q x Q l x U x ... x x U T - Thus each play determines 
a complete state-control trajectory for which J can be evaluated. This 
value of J is the cost associated with the play. 

In game theory, a strategy for player 1 is the assignment of a single 
alternative to each information set. For FSFM problems with simple 
information constraint, a control law is the assignment of a point in 

to each atom of F ^ (since y^ is constrained to be measurable) . 

Because of the manner in which the information sets have been constructed 
above, there is clearly a one-to-one correspondence between the control 
laws of a FSFM problem with simple information constraint and its 
corresponding extensive game form. Thus the same notation y will be 
used to describe either a control law sequence or a strategy for the 
equivalent extensive game. 

The equivalence between the extensive game and FSFM forms of a problem 
is best understood by example. Figure 3.3.2 illustrates the extensive 
game form of the FSFM problem considered in the previous section when 

F 1 = {(j), {l}, {2}, X.} 
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and V 2 = {$* {l}#* {2}, {3}, {4}, {l/2}, {l*3}, {1,4}, {2,3} , 

{2,4}, {3,4}, {1,2,3}, {2,3,4}, U,3,4}, {1,2,4} , X^ 

(full state information). Figure 3.3.3 illustrates the extensive 
game form when 

= {0 # {l}, {2}, x^ } , ^ = {4}, {1,2,3}, x^ } • 

The equivalence between FSFM problems and extensive games can be 
extended to FSFM models with information constraint. 


Definition 3.3.2 

The FSFM problem (2. 1. 1) - (2 .1. 2) is said to have an information 
constraint if 


U, = U. 1 x U 2 x ... x U m (3.3.4) 

t t t t 

r t = r t 1 x r t 2 x ... x r t m (3.3.5) 


for t = 1,2 , . • • ,T, where 




X 


e U. 


t-1 


/ 

:(Y t > 


(U t L ) 


c 



(3.3.6) 


where U 1 * = P(u *) and F ^ is a subfield of X = P(X ). 

V u* i w" 1 X t J- 

Since the equivalence will not be used in the sequel, the construction 
of a Kuhn extensive game model equivalent to the FSFM model with information 
constraint will be omitted. 

Since an equivalence has been established between FSFM models with 
simple information constraint and Kuhn extensive game models, the notions 
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of signaling strategy and perfect recall can now be precisely defined. 

The following definitions and propositions are stated for 1-player games, 
but can be easily extended to n-person games. 

Definition 3.3.3 [K2] 

A move Z of player 1 (n=l) is called possible when playing y if it has 
non-zero probability of occurring when the strategy y is used. An 
information set I for player 1 is called relevant when playing y if 
some Z e I is possible when playing y. 

Proposition 3.3.1 

A move Z for player 1 is possible when playing y if and only if y 

chooses all alternatives on the path W from the origin to Z which are 

Zj 

incident at moves of player l. 1 
Proof 

See reference [K2] f page 201. 

Definition 3.3.4 [K2] 

A game G is said to have perfect recall if I is relevant when playing 
y and Z e I implies that Z is possible when playing y for all I, Z 
and y. 

Definition 3.3.5 [Tl] 

Let I be an information set for player 1, and let 1^ = {moves following 
some move in I by alternative u}. Then I is a signaling information set 

^All chance, moves are assumed to occur with non-zero probability. 
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for player 1 if, for some u and some information set J of player 1, 

I u Hj jt <}) and J <£ I u * 

Proposition 3.3.2 [Tl] 

A game G has perfect recall if and only if player 1 has no signaling 
information sets. 

Proof 

See reference [Tl] , page 268. 

The following proposition is not valid for general games, but is a 
special property of 1-person (stochastic control) problems. 

Proposition 3.3.3 

Let G be a 1-person game with perfect recall, and let' I be an 
arbitrary information set of the player. If I is not relevant when 
playing Y, then the probability of any move in I is zero under y. If 
I is relevant when playing Y, then the probability of any move in I is 
positive under Y. Moreover, if I is relevant under any other strategy 
Y, then the probabilities of any move of I under Y and y are the same. 

Proof 

If I is not relevant when playing y, then by definition no move of 
I is possible when playing y. Thus the probability of any such move is 
zero when Y is used. 

If I is relevant when playing y r then every move of I is possible 
when playing Y since G has perfect recall. Thus the probability of any 
such move is positive when y is used. 
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If Z E I is possible when playing y, by Proposition 3.3.1 y must 
choose all alternatives on the path W from the origin to Z which are 
incident at moves of player 1. All other alternatives on are incident 
at chance moves, and the probability of Z under y is simply the product 
of the probabilities of these alternatives. But this probability is the 
same for y, since y likewise chooses all alternatives on the path W 
incident at moves of player 1. 

At this point, the preceeding definitions and propositions are applied 
to the FSFM problem. 

Definition 3.3.6 

A FSFM stochastic control problem is said to have perfect recall if 
it has a simple information constraint and the corresponding extensive 
game has perfect recall. 

Definition 3.3.7 

A control law y^ for a FSFM problem with simple information constraint 
is said to be a signaling control law if an atom of F ^ gives rise to 
a signaling information set in the corresponding extensive game. 

Corollary 3.3.4 

A FSFM stochastic control problem with simple information constraint 
has perfect recall if and only if it has no signaling control laws. 

Proof 

This is a direct consequence of the definitions, the construction of 
the equivalent extensive game, and Proposition 3.3.2. 
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Theorem 3.3.5 

Suppose that a FSFM stochastic control problem with perfect recall 
is given. Let A be an atom of F Then, for any control sequence, 

either the probability of all states in A is zero, or the probability of 
each state is a positive constant independent of y. 

Proof 

By construction, the probability of a state x(t-l) e A under y is 
equal to the probability of the corresponding set of moves in the 
information set I generated by A. Therefore, the theorem follows 
immediately from Proposition 3.3.3. 

The property of FSFM problems with perfect recall expressed by Theorem 
3.3.5 makes it possible to strengthen the minimum principle to achieve a 
sufficient condition for optimality. 

Definition 3.3.8 

Let the set of state probability vectors reachable at time t, 

1< t< T, when the initial state probability vector is TTg be denoted 

r Y 1 Y 2 Y t 

r t Or Q ) = {ir 0 p (l) p < 2 ) ... f (t) : y x £ 1^, y 2 e ^ 

. .. * Y t e r fc }. (3.3.7) 

r .( tt_ ) is called the reachable set (r rt (0 = {tO) . 
t 0 0 0 0 

Definition 3.3.9 

Suppose that the control law sequence y* = (y^*, y^*,..., Y T *) 
satisfies the condition 
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V 


V 


ir(t-l) P (t) <f>*(t) + ir(t-l) h (t) 


Y Y 

< TT(t-l) P fc (t) <}*(t) + TT(t-l) h fc (t) 


(3.3.8) 


for all Y t £ r , for all ir(t-l) £ r^ ( tt q ) where 


I Y * 

<f>*(t-l) = P* (t) <|>*(t) + h (t) 


Y** 


(3.3.9) 


for t - 1,2,...,T ((j>* (T) = 4> T ) • Then y* is said to be uni vers ally 

extremal. 


Lemma 3.3.6 

Any universally extremal control law sequence is optimal. 


Proof 


The proof proceeds by induction on the number of stages T, 


Suppose T = 1. Then 

Y-. 

J(Y 1 ) = ir(o) h x (i) + tt(1) 0(1) 

Y Y 

= ir(0) h 1 (1) + tt(0) p 1 (1) (f)(1) 


(3.3.10) 


so that any extremal is optimal. 

Suppose the lemma is valid for problems with T-l stages. It must be 
established that the lemma is valid for problems with T stages. 

Assume that (Y^*# Y 2 *' •••* Y T *) is universally extremal. It 
follows immediately that (Y 2 *# ^ 3 *' •••/ Y T *) is universally extremal for 
the problem with cost 

T y 

J(Y 2 , Y t ; TT(1)) = t | 2 TT(t-l) h (t) + 7T(T) (f)(T) (3.3.11) 
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for any ir(l) er.JiL). Therefore, by the induction hypothesis, 

J(Y 2 *, Y t *j tt( 1) < J(Y 2 » •••* Y t ; tt(D) (3.3.12) 

for all tt( 1) e r^T^) and for all Y 2 £ ^ ..., Y T £ 1^. Moreover, since 

Y Y 

J{Yy Y 2 , Y t ) - tt( 0) h 1 (1) + J(Y 2 » •••» Yy *«>) P 1 (D) 

(3.3.13) 

it follows that 

J(Yy Y 2 * Y t *)< My Y 2 , Y t ) (3.3.14) 

for all Y x e Ty Yj ^ ^ ^ e Vy 

But the assumption that (Y^, Y 2 *» • ••, Y T *> is universally extremal 
implies that 

Y * y * 

J(Y 1*' V' ***' V J = ir(0) h 1 (1) + *t°> p 1 (1) <J>*<1) 

Y Y 

<TT(0) h 1 (1) + TT(0) P 1 (1) (J>*(1) = J( Yl , Y 2 *, ..., Y t *> 

(3.3.15) 

for all Y^ £ 1^. The lemma follows from (3.3.15) and (3.3.14). 

Notice from the proof of Lemma 3.3.6 that the existence of a universally 
extremal control law sequence Y* implies the unusual fact that the 
problems 

min J(Yir •••» Y*. n * Y+> •••, YJ (3.3.16) 

Y t e r t y t £ r T fc 

for Yj^ £ 1^, ..., Y fc _^ £ have a common solution (Y t *» •••, Y T *) ■ 

Thus the existence of a universal extremal would seem to be rather unlikely. 
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From this viewpoint, the following property of FSFM problems with perfect 
recall seems rather remarkable. 

Theorem 3.3.7 

Every FSFM problem with perfect recall has a universally extremal 
control law sequence. 

Proof 

The proof is constructive. The control laws Y t are defined by 
choosing their values on the atoms of F 

i 

Consider the case for t-T. Let A . be an atom of _ , i “ 1,2,..., 

T-l T-l 

P. For simplicity of notation, suppose that contains the first Z^ 

2 

states of A t- 1 conta i ns states + 1 through Z^ of , etc. 

Notice that 

Y Y 

TT(T-l) P T (T) (J>(T) + TT(T-l) h T (T) (3.3.17) 

£ 

P i T n u (T) Up(T) 

= 2 2 TT (T-l) r P P (T) <j >. ■ (T) + h (T) 

i=l j= )1 i _ 1 + l 3 l*»l 3k k 3 

where n is the number of states in , ^=0, and u p (T) is the value of 

Y t on the pth atom of 

The decomposition (3.3.17) makes the construction of Y T * clear. By 

Proposition 3.3.5, every vector it (T-l) e r T _^(TT 0 ) e ^ th er has tt^(T-l)=0, 

i = JL + 1, ..., A i+1 , or has tt^t-I) = ft^T-l), i = JL + 1, ..., 

where each tt^(t-I) is a fixed number. Therefore, Y T * takes the 

value u * (t) on the pth atom of F m , , 

P T-l 


where 
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min 


E IT. (T-l) 

u e u m j=£. .+1 3 

T l-l 


n 

Z P jk U (T) <f> K (T) + h.. 


+ h. U (T) 


n u *(T) 


u * (T) 


Z IT. (T-l) 

►W 1 3 


IP..* (T) ^(T) ♦ h. r (T) 


k-1 3k 


(3.3.18) 


The construction of the remaining Y * is completed by applying an 
analogous procedure to 

Y Y 

TT(t-l) P fc (t) (t>* (t) + TT(t-l) h St). (3.3.19) 

Theorem 3.3.7 is primarily of theoretical and conceptual importance. 
Problems with perfect recall are more efficiently handled by deriving an 
equivalent deterministic problem that has a conditional probability vector 
for the deterministic state. (The conditioning is with respect to the 
field F 1 . Special cases of this procedure are implicit in the usual 
stochastic dynamic programming algorithm [Aol, Stl, Asl] and the 
algorithm of Sandell and Athans for the 1-step delay problem [Si] . 
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3*4 Finite-Set Team Problem 

A minimum principle was derived in section 3*1, and its properties 
considered in the following sections. It is interesting to see if 
feasible numerical algorithms based on the minimum principle can be 
developed. A start in this direction in contained in the present section, 
where the numerical solution of the finite set team problem is considered. 

In the sequel, the convaiient notation 


/ f(x) dTT(x) = £ f(x) 'ir(x) 

X x 0 X 


will be used. 

Let X, U. , U be finite sets with n, m, , nu, m 

1 2 p 1 2 p 

elements , respectively. 

Let IT be a probability measure on X, and h : XxlL x U_ x ... xU 

12 p 

a given real-valued function. 


The finite set team problem 1 is: 


min 

Y 1 e T 


1 


min 


\ 6 f 2 


min / h(x, Y. (x) , Y (x) , . .., Y (x)) dlT(x) 
Y e r 1 2 n 

P P 


where ^ C U^, ^ C 


• « * , 


r c 

p 



(3.4.1) 


Note that for given y^, Y 


Y i the integral above can be computed 


1 When - {y^ £ : Y^ -1 (U ) C F^}, where U = P(u) , and F^ is a subfield 

of x = P(X), for i = 1, ..., n, then the finite set team problem is a 
special case of the more general formulation of Marshak and Radner [M2] . 

Of course, in this case the finite set team problem is a FSFM problem 
with information constraint. 
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with n-1 multiplications and n-1 additions. Therefore the finite set 
team problem can be solved with at most 


(n-l) 

» 

n 

m l • 

n 

m 2 

(n-1) 

* 

m. n • 
1 

n 

m 2 

n 


n 

n 

m l * 

nu 

a 

> • • # 

m 

P 

has 

at 

most 

n 

m. i 


n 


n 


head required to compute h(x, y. (x) , y_ (x) , y (x) ) for given 


x, y. , Y , . y . However, this is ignored in the following discussion. 

A 4 P 

For certain special cases, one can do better. 


Case 1 Perfect state observation. 
In this case. 



It is easy to see that the problem is solved by computing 


min 

u e U 


1 


min 

u 2 £ U 


2 


min h(x, u^, u 2# 

u e u 
P P 


u ) (3.4.2) 

P 


for each x e X. Therefore, the problem is solved with 


No multiplications, 

No additions, 

m sets of nij • ... comparisons. 

This is a considerable saving. However, the problem is of limited 
interest since the stochastic aspect of the problem is trivial. Notice 
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that the probability measure 7T does not affect the solution at all. 


Case 2 Common measurement. 

In this case, the admissible control laws are measurable with respect 
to the field F determined by a given finite partition {a^, A 2 , ... 
of X. This means that each control law is constant on each atom A^ C X 
of the field F. 

Notice that 

/ h (x, y (x) , Y (x) , •••* y (x)) dlT(x) 

y 14 F 

(3.4.3) 

k 

= Z / h(x, y (x) , y (x) , ..., Y (*) <3n(x). 

i=l A. 1 p 

1 

Since each y^ is constant on P^ the problem can be solved by k 
minimizations of the form 



nan min . * . min 


U 1 U 2 


U A. 

P i 


/ h (x, u 2 , u 2 , 


Each such problem requires 


u ) d7T(x) (3,4.4) 
P 


(&j-l) * m 1 • 
(^“1) • m 1 • m 2 


multiplications , 

m additions , 

P 


comparisons , 


where JL = number of elements of X in atom A. of p note that 
k 

E = n. Therefore a total of 
i=l 
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(n-k) * • * 
(n-k) • . . 
k sets of m^* 


in 


P 

m 


P 



multiplications 

additions 

comparisons 


are required. 

This problem corresponds to the usual Bayesian statistical decision 
problem. Such problems are usually treated by a-posteriori analysis 
[Ral] . That is, the quantity 


E {h<x, , ..., Y p ( x )) I 

= E {h(x, u. , . u ) | A. } 

1 P ' 1 


( 3 . 4 . 5 ) 


where Y 1 (x) = u^ Y 2 (x) = u 2 » ..., Y p (x) = u p , for all x e A^, is 
minimized for each A^ £ F. Note that 


E {h(x, u^, ..., u^) | A ± > 


/ h(x, u. r ...» u ) dTT(x) 

A. p 

l 


/ dTT(x) 


(3.4.6) 


The probability it is normalized to give the conditional probability on A^ 
in the Bayesian formulation, but this is unnecessary. Therefore, a 
posteriori analysis is equivalent to the preceeding analysis. 


Case 3 Team decision problem. 

In this case, I\ consists of control laws measurable with respect to 
the field F 1 generated by the partition 
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^ A^ r r • * * t A^ } * 


Clearly, there are 


k n K k 

1.2 p 

m_ iru . . . m ^ 

12 p 


possible control laws. Evidently, then, 

k l k 2 k p 

(n-1) • ... p multiplications 

k 1 k 2 k 

(n-1) ■ m_ ■ m_ . . . m P additions 

12 p 

k. k k 

1 2 p 

. m . . . m * comparisons 


are required to solve the problem. 

This figure can be improved upon, but only slightly. For simplicity, 
k. k 

assume max m. 1 = m 


1<K, 1 p -2 f 


If Y r ^2* Y p -i are 9 iven ' then 


in / h(x, Y 1 (x) , Y ^x), Y (x) ) du(x) (3.4.7) 


min 


can be computed as in case 2. This gives Y *(•; Y, > Y., > • • • i Y , ) • 

p 12 p— 1 


Then 


min min 
Y 1 Y 1 


. . min / h(x, y (x) , . . . , Y (x) , 

M J- P“ 1 


'P-1 


Y p *(x; Y x » Yp_ 1 ))d7r(x) (3.4.8) 

can be computed. However this procedure cannot be iterated further, since 
Y p * depends on the entire functions Y x » Y 2 » • • • / Y In other words, 


attempting to solve 
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min / . h(x, y^x), ..., u 1# Y p *(x?Y 1 , Y p _ x > diK*> 

U . A. 
p-X l 

(3.4.9) 

p-1 

will not work, since changing the value of Y on the atom A^ 
changes Y p *» But Y p * affects the value of the preceeding integral 
over X - A i P ^ 1 . Therefore Y _j* cannot be obtained by independent 

p— 1 

optimizations of integrals over atoms of F - these optimizations are 
coupled through y^* • 

Therefore, the best that can be done is 


(n-k p ) 

* m 
P 

k i 

• ro i 

k k 

. 2 p-1 

m_ ♦ * ■ m _ 

z p— -L 

multiplications 

(n-k ) 

• m 

. k i 

. „ k 2 . Vl 

m_ . . . m , 

additions 

p 

P 

1 

2 p-1 


k i * 
p-1 

k i 

m i 

k 2 

* *2 

k -1 

. . . m , p sets of 

p-1 

m^ comparisons. 


This gets formidable very fast. Suppose; 
p = 3 (3 controllers) 

- m^ = 10 (10 controls) 

n = 100 (100 states) 

= k 2 = = 2 (2 observations) 

-5 

Assuming that a floating point multiplication requires 10 seconds to 
perform, and that a floating point addition requires 10 ^ seconds^, about 
110 seconds of central processing unit time of a modern high-speed 
computer are required just to perform the additions and multiplications. 
If there are three observations, this increases to 1.1 x 10 seconds ^ 
300 hrs ! Thus even problems of a rather modest size tax the capabilities 
1 

These numbers are approximately correct for the IBM 370/165. 
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of modern computers. 

Clearly, some approach other than exhaustive elimination must be 
employed. One such approach is the following algorithm. 

Algorithm 

1. Guess control laws y^°, y^ 0 , ...» y^°. Compute 

J° = / h(x, y °(x) , Y_°(x), ..., y 0 (x) ) d-rr(x) 
x p 

Set i = 1, j = 0. 


2. Solve the problem 

J = min / h(x, y (x) , y (x) , •••/ Y (x) ) dTT(x) 

Yi e r 1 2 p 

where \ k > i 

and y R = Y k ^ +1 k < i . 

Let some minimizing y^ above be denoted y (Keep y^" 1 if it is 

a minimizing control law) • 

If i < p, set i = i+1 and return to 2. 

"* i 

If i = p, check J < J • 

~ j 

If J < J , set j = j+1, i =* 1 and return to 2. 

If J “ stop* 


Definition [M2] 

A set of control laws y , 
for the team decision problem if 


is person-by-person optimal 
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/ ht^ Y l (x), Y^.j/x), Y^x), Y i+1 (x), Y (x)) dit(x) 


min / h(x, ^(x), . .., Y i . 1 <x) , y^x) , Y ±+1 (x) 


I • * • / 


y. c r 

i x 


Y (x) ) dTT(x) 
p 


(3.4.10) 

The interpretation of person-by-person optimality is that no team 
member can unilaterally decrease the cost. Thus person-by-person 
optimality is a necessary, but not sufficient, condition for optimality. 


Theorem 3.4.1 

After a finite number of steps , the preceeding algorithm converges 
to a person-by-person optimal solution. 


Proof 

Let the set of numbers J such that 


J= / h(x, Y,(x), ..., Y (x)) diT(x) 


(3.4.11) 


for some y e. T , ... r y e F be denoted S. Since 
11 P P 

has < m *1 m^ 2 ... m elements) its elements can 
12 p 

descending order r 


S is finite (it 
be arranged in 


S - (Jjy J 2 , ..., J^) , J i > J i+1 - 

Consider the set of positive numbers 

T = (J 1 " J 2' **• ' J £-l " “V 

and let £ = inf T. Note that £ > 0. 

Consider the difference — J defined in the previous algorithm. 
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Clearly, either J ] - J = 0, or J 3 - J > E. By induction, 

J j <J°-j£. (3.4.12) 

Therefore eventually = J, since inf S is finite. But J" 3 = J 
(Y^ + \ ..., Yp' ,+ ^) i s person-by person optimal. 

The algorithm requires 


P 

£ (n-k^) itl multiplications 


P 

I (n-k.) m. additions 
i=l 1 1 


k. sets of m 1 comparisons, ..., k sets of ra comparisons per iteration. 

XX P P 

Thus the previous example requires ^ *033 seconds per iteration for 
2 observations, and siightly less time for 3 observations. 

The algorithm will always improve a subcptimal strategy, unless that 
strategy is already person-by-person optimal. It will not produce a 
globally optimal strategy in general, however. Thus the algorithm is 
a reasonable approach to the problem. The approach is similar in 
philosophy to using a gradient algorithm to solve a nonlinear 


optimization problem. 
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3.5 The Min-H Algorithm 

A substantial number of numerical algorithms have been suggested 
for the solution of deterministic optimal control problems. The most 
natural of these for the FSFM problem is the min-H algorithm , which is 
intimately related to the minimum principle. The min-H algorithm was 
initially suggested by Kelley [Kel] . Platzman [Pll] has shown that the 
algorithm is equivalent to Howard's policy iteration method for Markovian 
decision processes, and has suggested its application to the imperfect 
state information case of that problem. 

To simplify the notation, the sets X^_ and U^_ are assumed to have 
a constant cardinality for 0 < t <; T. 


Algorithm 3.5.1 (Min-H) 

1. Guess Yj 0 , y 2 °, Y t °- Set j = 0. 

2. Compute cf> 3 (T) , c|> 3 (T-1) , (J) j (l> using Y T j Yj 3 in the 

adjoint equation (<£ 3 (T) = <|> ) . Set t = 1. 

1 y Y 

3. Choose to minimize TT^ + ^(t-l) P fc (t) <f> 3 (t) + ff 3 (t-1) h (t) . 


(7T j+1 (0) = TO 

U . ^ 
4. If t < T, compute TT 3+3, (t) = IT 3 ^(t-1) P 

Set t = t+1, and go to 3. 

If t = T, test J 3+1 < J 3 , where 


5. 


j+1 


(t) . 


T . y J 

J 3 = l TT 3 (t-1) h (t) + TT 3 (T) 4> . 
t=l 


^If Y fc 3+ ^ is not unique, choose arbitrarily but with preference for Y t 3 
if it is in the minimizing set. 
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If J 3+ ^ < J , set j = j+1, t = 0, and go to 2 . 

If J D+1 = J 3 , stop. 

Theorem 3.5.1 

The preceeding algorithm converges in a finite number of steps 
to an extremal solution. 

Proof 

The proof is completely analogous to the proof of Theorem 3.4.1. 

The reason for the strong analogy between the algorithms of 
section 3.4 and this section is that both are embodiments of the method 
of orthogonal search . The method of orthogonal search applies to the 
problem 


min ... min h(x lf ...,x ). (3.5.1) 

x. x 

1 n 

The procedure is to fix all of the variables but one and to minimize 

over that variable. This is done repeatedly, so that the cost decreases 

monotonically. Convergence is (essentially) assured, but the 

convergence will not in general be to the optimal solution without 

further assumptions (e.g., convexity of h) . 

It is important to note that, as applied to this problem, the min H 

algorithm is exactly equivalent to orthogonal search. This is a 

consequence of the fact that the dynamics and cost are linear in TT . The 

t 

more general case in optimal control theory is that the Hamiltonian 
gives only a linear approximation to the optimal cost-to-go. To quote 
Kelly [Kel] 
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"In adopting the control y = y* (t) generated by min H* as our next 
approximation, we must risk the violation of our linearizing assumptions, 
for this may represent a large step process." 

This difficulty will not occur in the preceeding algorithm. 

Notice that at each iteration, the quantity 


TT(t-l)(P fc (t) + h t (t)) 

must be computed for all y £ r . 
requires 


(3.5.2) 

Evidently each such computation 


2 

n + n-1 multiplications 
2 

n + n-1 additions 

for each Y e ^ Since the number of y is on the order of m. n (m = # controls, 
n = # states) , this computation appears to be hopeless for even 
moderately sized problems. 

However, a deeper look at the structure of the problem shows that the 
situation can be improved considerably. Define 


P. . U (t) = p. ({q : j = f . (i , u, q) }) 1 < i, j n (3.5.3) 

It follows that 

P. .^(t) = P. . U (t) j = 1, ..., n (3.5.4) 

ij 13 

v 

when u = y(i) . Therefore, for all y £ each row of P (t) is a row of 
p U (t) for some u e U fc . There are precisely nm such rows, 
column vectors 

P u (t) 4>(t) + h u (t) 


Therefore the 
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can be computed with mn multiplications and then the column vectors 
p Y (t) 4>(t) + h Y (t) 

can be formed by selecting the appropriate elements from the set 
{p U (t) <f) (t) + h U (t)>. 

Thus the quantities 

TT(t-l) (P Y (t> <|>(t) + h Y (t)) 

can be computed with 
2 

2 mn multiplications 

2 m 11 " 1 -! 

mn (n-1) +n + n :: additions 

in-1 

(assuming T has m 11 elements) . This is a considerable improvement, 
especially since multiplications take approximately 10 times longer to 
perform than additions. However, the number of additions is still too 
large. Further improvement can only be made in the light of assumptions 
on the nature of T. 

Case 1 r t = U t (perfect state measurement) 

This is the simplest case. Simply choose Y t *(i) = u*, where 

2 P ± . U *(t) (t) + h U *(t) = min 2 p . , u (t) 4> (t) + h. U (t) (3.5.5) 

j 3 - 1 u j 13 3 1 

This requires mn multiplications, mn additions, and n sets of m comparisons. 
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Case 2 {imperfect state observations) 

T consists of control laws that are measurable with respect to 

1 2 k> f 

the field generated by a finite partition iA t _ 1 , , A fc _ L I or 


X t-1* 


Choose y * (i) = u* for all i e A^ , where 


Z * tt. (t-1) EP.. U *(t) <f>. (t) + h. U (t) 

x e A fc _ 1 \3 


(3.5.6) 


min Z o TT. (t-1) iP.At) <p i,(t)+h i “(t) 

u e U t i e A t . t b J 


This requires about m( 


n 

n 2 +n) multiplications, mn + m(n-k) additions, and 


k sets of m comparisons. 

By now the close connection between cases 1 and 2 of section 4 
and the above should be apparent. This is a consequence of the fact 
that the problem 

Y Y 

min -rr(t-l) (P t (t) <Kt) + h t (t)) 
r t 

is precisely a team problem as defined in section 4. Clearly, the 
analysis of Case 3 of the first section can also be extended. 


Case 3 (dynamic team problem) 


r t " ^ X T t X *•' r t 


consists of control laws measurable with respect the field 
generated by the partition 


(i 1 i 2 i ^i) 

\ A t-1 ' A t-1 ' * * " A t-1 J 


i — 1 , « » « / k 
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of ^ (k controllers) . 

As in section 4, the combinatorics of the problem are overwhelming , 
so that resort must be made to the notion of person-by-person optimality. 
Make the following notational convention: 


y. , ■* ii 1 x u* x ... x u n 

t t“l t t t 


Y t ' Y t ' *’■ ' Y t > * 


Then 


Y^/ Y 2 # •••» Y t ) 

_ t • • ■ t Yp 9 Y2 t * • ■ > Y2 f • • • * Y T / • • • f Yfp ) * 


Definition 


A sequence 


Y* ^1* ' " m ' Xp*) - (Y^ 9 ••*#Y^ , Yp 1 , . y ^ 11 ) 

is said to be a person-by -person extremal if 

7 * 1 * i* ic* 

(Y^ • • > • r Y t I •••» Yj ) 

^ J(Y 1 1 *, ..., Y t X Y T k *) for all y^ e r 


i~lf • « • , k , t = 1, . . . / T . 


(3.5.7) 


Every optimal control law sequence is a person-by-person extremal, 
but the converse need not be true. Algorithms 3.4.1 and 3.5.1 can be 
combined to given an algorithm that always converges to a person-by- 
person extremal. The order of minimization is 
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■ • ■ / 



Thus K forward and backward sweeps of the state and costate equations 
are required per iteration. The number of multiplications required is 
(exclusive of the state and costate computation) is 


and 


T l E m. (n 
ii=l 1 


T I E m. (n 
i=l 1 



+ 


n - k. 
i 


additions are required with 


k 

T E k. 
i=l 1 


sets of nu comparisons . 

Notice that person-by-person approach is consistent with the minimum 
principle approach: 

1. both approaches given necessary conditions for optimality 

2. both approaches are sufficient only under convexity assumptions 
that do not hold in general 

3. An initial guess is improved, but the improvement may stop 
short of optimal. 

These facts are consequences of the fact that the person-by-person and 
min H algorithms are actually both concrete realizations of orthogonal 


search. 
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CHAPTER IV 

DYNAMIC PROGRAMMING FOR THE FSFM PROBLEM 

In this chapter, the dual dynamic programming equations for forward 
and backward induction are presented. These equations follow 
immediately from classical dynamic programming theory [Bl] as applied 
to the equivalent deterministic problem of Chapter II. 

Numerical solution of the dynamic programming equations is a dif- 
ficult task. Three approaches are suggested. The first is the usual 
technique of replacing the continuous state space with a discrete grid. 
The second exploits the fact that the reachable and coreachable sets 
of the problem are finite. The third approach applies an algorithm of 
Sondik [Sol,Sml] developed for the Markovian decision problem with 
incomplete state information to the FSFM problem. 

The chapter closes with an example. 

4 . 1 The Equations for Forward and Backward Induction 

Recall from Chapter 2 that the deterministic optimal control problem 
equivalent to the FSFM problem is to minimize 

l Y t 

J (y) = 7T (T) (j) (T) + l TT(t-l) h (t) (4.1.1) 

t=l 

for ycT subject to 

Y t 

7T(t) = 7T ( t— 1) P (t) , t - 1,2, . . . ,T 


(4.1.2) 
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IT (0) = 1T 0 

where r x x...x is a finite set. 


Define the functions 


V T 00 = , 


V (IT) = min 

Y t+l eI t+l' 


Y Y 

TTh t+1 (t+l) + V t+1 (TTP t+1 (t+l) ) 


(4.1.3) 


(4.1.4) 

(4.1.5) 


W 0 (<j>) = 1T 0 <J) 


W (4>) = min 

Wtfl l 


Y t+1 Y t+1 

w t (p (t+1) + h (t+1)) 


(4.1.6) 

(4.1.7) 


The following theorems describe the dynamic programming algor- 
ithms for backward and forward induction. The proofs are a simple 
application of dynamic programming theory [Bl] . 


Theorem 4.1.1 (Backward induction) 

Let the map 6 fc+1 : ^ **■ be defined for t= 0,1,..., T-l by 

by «,<»«=» - Y if Y^ + ^ is a minimizing control law in (4,1,5) 
for 7T = TT ( t ) ♦ Define the quantities 7T*(t) , Y^* by 


TT* (0) 

n 

0 

(4.1.8) 

Y * = 
T t 

<5 t Or* (t-l) ) , t = 1 , 2 , ...,t. 

(4.1.9) 

TT*(t) 

= IT* (t-l) P Y t(t), t = 1,2,. ..,T. 

(4.1.10) 


Then Y* = (Y^* 2 * 9 ’ * • rY«j«* ) is an optimal control law sequence with 
corresponding sequence TT* (0) ,1T* (1) , * . . ,tt* (T) of optimal states. 
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Theorem 4.1.2 (Forward Induction) 

Let the map ^-+1 ^t+1 be def ^- ned for “ 0,1,..., T-l by 

5 , (<b(t+l)) = y. , if Y is the minimizing control law in (4.1.7) 

t+1 t+1 t+1 

for <j> = 4>(t+l) . Define the quantities 4>*(t),Y t * by 

4** (T) = <f> T , (4.1.11) 

Y t * = <5 t «f>*(t)), t = 1,2, ...,T, (4.1.12) 

V Y t 

4>*(t-l) = P (t) cf>* (t) + h (t), t = 1,2,. ..,T. (44.13) 

Then Y* - (Y^* rY 2 *' * • • 'Y T *) is an optimal control law sequence with 
corresponding sequence <f>* (0) 4* (1) / • • - t (f 1 * (T) of optimal costates. 

There are a number of comments to be made about Theorems 4.1.1 
and 4.1.2. First notice that although the original state space 

is finite, the dynamic programming algorithm must be carried out 
in the uncountable state space II ^ of the equivalent deterministic 
problem. This is due to the following requirement of dynamic program- 
ming as expressed by Bellman [B2] . 

’’After any number of decisions, say k, we wish the effect of 
the remaining N-k stages of the decision process on the total return 
to depend only upon the state of the system at the end of the k-th 
decision and the subsequent decisions." 

This statement may be regarded as a definition of what constitutes 
a state for purposes of dynamic programming. It is well-known in 
classical stochastic control that the state for dynamic programming 
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purposes is not the physical state, but is instead the probability 
distribution of the physical state conditioned on past measurements 
and inputs. The approach of this chapter, in which the unconditional 
distribution of the physical state is the dynamic programming state 
is due to Witsenhausen [W4]. 

Second, notice that both the forward and backward algorithms 
require two passes. If backward induction is applied, the first 
(backward) pass determines 6^ from equations (4,1,4) and (4.1.5), and 
then the second (forward) pass eliminates the dependence of the 
control laws on the state ir(t) . Similarly, for forward induction, 
the first (forward) pass determines 6 t from equations (4.1.6) and 
(4.1.7), and then the second (backward) pass eliminates the dependence 
of the control laws on the costate <J>(t). Thus although only the 
control laws corresponding to the sequence it* (0) ,TT* (1) , . • . ,7r* (T) or 
<f)*(0) ,<j>*(l) , . . . ,<J>* (T) are of interest, the optimal control laws 
for all such sequences must be computed. The original problem 
has effectively been embedded into an entire class of similar 
problems . 

Third, Theorems 4.1.1 and 4.1.2 exhibit a particularly striking 
duality. The function V t (*) gives the optimal cost-to-go at time t 
for a given state probability vector TT. The function W (•) gives 
the optimal cost-to-go at time t for a given cost-to-go vector <f). 

The vector 7f(t) summarizes the effect of past control laws, the 
vector 0 (t) summarizes the effect of future control laws, and their 
scalar product 7T(t)<j>(t) is the expected cost-to-go. 



Finally, although Theorems 4.1.1 and 4.1.2 provide an elegant 
dual set of sufficient conditions for the FSFM problem, development 
of feasible numerical algorithms based on these theorems is difficult. 
Some possible approaches are discussed in the next section. 
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4 >2 Numerical Solution 

Since the dynamic programming algorithms for forward and back- 
ward induction formulated in the previous section have a continuous 
state space, some type of discretization is necessary for digital 
computer solution* The straightforward approach is to define a 
partition over the space JI or 0^* Note, however, that if 11^ an d 

$ are n -dimensional and each dimension is partitioned into 100 
t t _ 

elements (for a 1% partition), the grid has 100 elements! This 
is the well-known curse of dimensionality . Although numerous 
heuristic schemes have been suggested to minimize this difficulty 
(see [Lari] for example) , none has been widely accepted. 

An alternative procedure utilizes the special structure of 
the state and costate equations. Recall that the reachable sets 
r t (\>) are defined b y equation (3.3.7). similarly, the coreachable 
sets P t (4> T ) are defined by 

P t ( 4> t ) = { 4 > t } , (4.2.1) 

Y Y 

p t-l ( V “ p t (t)<|>(t) + h fc (t) :Y t er t ,<Mt)ep t (4> T ) , (4.2.2) 

for t = 1,2,*.*, T* The interesting fact is that although and 
are continuous, the sets r^ClT^) anc ^ P t (4> T ) are discrete. Since 
the minimizations in (4.1.5) and (4.1.7) need only be carried out for 
ire ^(Hq) and (<^) t respectively, it is really unnecessary to 

consider II and $ . The difficulty with this approach is that the 
sets r^_ and p^, although finite, will in general be quite large. 
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Upper bounds to the number of elements of r t (7r Q ) and P t (4> T ) are 

t 

card(r (IT )) <_ X card(T ) (4.2.3) 

* 0 T=1 T 

T 

card(p (<b )) £ X card(r ) (4.2.4) 

T=t+1 

where card (A) , the cardinality of a finite set A, is the number of 
elements of A. Although these bounds will in general not be achieved 
(since two distinct control law sequences can lead to the same state 
or costate) , the sets r (IT ) and p (<j> ) will still be too large to 
be conveniently computed except in particularly simple cases. 

A third approach to the numerical solution of the dynamic 
programming equations uses the special structure of the problem in 
a different way. The key observation is that the functions 
W t> are piecewise linear and concave. 

Proposition 4.2.1 

Consider the functions defined by (4.1.4), (4.1.5). For 

t = 0,1,..., T, there is a finite set of (column) vectors A^ such that 

V Or) = min TTa(t) (4.2.5) 

a(t)£A t 

Proof 

The proof proceeds by a backward induction argument. Note that 
at t = T, 

v t (tt) = 7r4> T 


(4.2.6) 
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so that (4.2.5) holds with 


a t ~ tt} * 


If it is assumed that 


V (TT) = min TT a(t+l) , 

a(t + l)e At+1 


Y Y 

V (TT) = min TTh t+1 (t+l) + V , (TTP t+1 (t+1) ) 

Y er t+1 

Y t+1 t+1 


Y t+I er t+1 


Y t+1 Y t+1 

min TTh (t+1) + min 7TP (t+1) a (t+1) 


a (t+1) £A. 


W r t.i ,ll,lla «i 


Y Y 

TTh t+1 (t+l) + TTP t+1 (t+l)a(t+l) 


min 7Ta(t) 
a(t)£K 


where 

r ^t+i ^ t+i 

A t = h (t+1) + P (t+l) a (t+1) : Y t+1 er t+1 ,a(t+l) eA 


Therefore, the proposition is true for all t, 0 < t < T. 


A similar result holds for the function . 
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Proposition 4.2.2 

Consider the functions W t (0 defined by (4.1.6), (4*1.7). For 
t = 0,1,..., T, there is a finite set of (row) vectors B such that 

W (40 = min b(t)<£ . (4.2.11) 

b(t)EB t 

Proof 

The proof proceeds by a forward induction argument completely 
analogous to the proof of Proposition 4.2.1, and is therefore omitted. 

A representation similar to (4.2.5) has appeared in the 
literature on Markovian decision processes with incomplete state 
information. Evidently, Astrom [Asl] was the first to use the 
representation for a specific example. However, Sondik [Sol,Sml] 
has systematically exploited the representation to derive an 
algorithm for the backward equation. It is shown below that the 
Sondik algorithm can be directly applied to the backward equations 
arising in the FSFM problem. Moreover, the algorithm can be dualized 
to apply to the forward equations. 

Note that the set A^ defined in the proof of Proposition 4.2.1 
is, in general, larger than necessary. For a given a*(t)£A t , there 
may not exist a TT£lI t such that 

min Tra(t) = TTa*(t). (4.2.12) 

a(t)£A t 

To find a smaller set A d A satisfying the above condition, it is 
necessary to look at the problem more closely. 
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Suppose that a set A fc is given that contains a minimizing element 
of in {4.2.12} for every TreIT t . Suppose that A fc has elements, 


A fc = a 1 (t) , ...,a fc (t) 


(4.2.13) 


Since A^ is a subset of the set A^_ defined by (4.2.10) , it follows that 


card (A ) <_ X (card T ) 
T=t+1 


for t = 0,1,..., T-l, and card(A T ) = 1. Let the sets R.. (t) be defined 


for j * by 


R (t) = irell : min TTa(t) = Tra J (t) 

3 j C a(t)£A t 


(4.2.14) 


Assume that A is chosen so that R. (t) ^ <(). Clearly, 

II = U R. (t) . (4.2.15) 

j=l 11 

Lemma 4.2.3 

The sets R. (t) , j = and t - are convex 

sets with linear boundaries* 

Proof 

Note that 

Rj (t) fl ^(t) = |7TeRj (t) U R k (t) : 7T(a^(t) - a k (t)) =0 j,(4.2.l6) 

so that if there is a boundary between Rj (t) and R^(t) , it is linear. 

1 2 

To prove convexity, suppose TT ,tt £Rj (t) . Then 


tVu) < w 1 a<t) 


(4.2.17) 
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T1 2 a? (t) £ TT 2 a(t) (4.2.18) 

for all a(t)eA fc . Therefore, all Ae[0,l], 

A‘ir 1 a^ (t) + <1-A)ir 2 a j (t) £ A^att) + (1-A)ir 2 a(t) (4.2.19) 

~ 12 
for all a(t)£A . Equation (4.2.19) implies that Att + (1— A) TT CR.(t). 

^ 3 

By assumption, for each a(t)eS t C A^_, there exist A^eF and 

a(t+l)£A , such that 
t+1 

y t+i Vi 

a(t) = h (t+1) + P x (t+1) a (t+1) . (4.2.20) 



Compute 
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min 

v r t 


min 

a(t)CK 


Y Y 

tt ( h fc (t) + P fc (t) a (t) ) 


V 


Y ^ 1 

= TT(h (t) + P r (t)a 3 (t)) , 


(4.2.24) 


T fc l 


7 t 

where ttP (t) E R^ (t) , and let 


i V 

a (t-1) • h (t) + P (t)a J (t) 


V...j 


(4.2.25) 


At this stage, one point TT£R^(t-l) has been found, the first 

point a 1 (t-1) of A has been obtained, and y ^determined. Next, the 

t“l t 

boundary of R^(t-l) is determined# Notice that TT£R^(t-l) if and only if 

(4 .2 #26) 


Y Y 

TTa 1 (t-1) ir(h fc (t) + P fc (t)a(t)) 


Y Y 

-rra 1 (t-1) = tt ( h fc (t) + P t (t)a(t)). 


for all y er , a(t)eA t< A point TreR^t-1) is on the boundary of 
R^ft-l) if and only if there exist Y t # a(t) such that 

(4.2.27) 

This condition can be tested by solving the following linear 

program. The problem is to minimize 

Y Y 

TT(h fc (t) + P fc (t)a(t) - a 1 (t-1)) (4.2.28) 


over TT £ II subject to 


Y Y 

it ( h fc (t) + P fc (t) a(t) -a 1 (t-1) ) > 0 


(4.2.29) 


for all Y t e T t ,a(t) £ The first Y t £ and a(t) £ for which 


the minimum of (4.2.28) is zero define Y. - Y. # and 

Y Y 

a 2 (t-1) = h t (t) + P t (t)a(t). 


(4.2.30) 


By repeatedly solving the linear program, in a similar manner all 
the vectors a 3 (t-1) and control laws y 3 corresponding to regions 
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bordering R^(t-l) are determined. The procedure is then repeated 
until all the regions R^ (t-1) with the corresponding vectors 
(t-1) and control laws y^ are determined. Repetition of the 
algorithm permits the confutation of all the quantities in (4.2.23). 

The resulting algorithm is summarized in Figure 2.4.1. The 
corresponding dual algorithm is illustrated by Figure 2,4.2. 

Of course, after one of the algorithms is carried out, a sweep 
of the state or costate equations is required to eliminate the 
dependence of the control laws . 

Sondik's algorithm is an attempt to circumvent the curse of 
dimensionality that arises when the state space 11^ is partitioned 
by a grid. The algorithm has the desirable properties that 
(i) it is exact 

(ii) the partition^ may be considerably coarser than a 
naive grid partition. 

However, the number of elements of the partition is not known a priori, 
and can be expected to increase rapidly with increasing T. Moreover, 
the irregular nature of the partition sets makes computer storage 
awkward . 

Three alternative approaches to the solution of the dynamic 
programming functional equations have been given in this section. All 
of the approaches are extremely limited with respect to the size of 


The term partition is used in an informal sense here, since distinct 
elements of the partition can share a common boundary and therefore 
have a nonempty intersection. 
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Figure 4.2.1 Sondik's Algorithm Applied to FSFM Problem 
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Figure 4*2*2 Dual Version of Sondik’s Algorithm 
Applied to the FSFM Problem. 
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problems they can handle for reasons that have been discussed* This 
situation is not surprising since it is well known that dynamic program- 
ming for problems with continuous state space is an exceedingly 
difficult computation problem* 

Thus , the situation for the FSFM problem is quite similar to 
that encountered in deterministic optimal control theory* Due to the 
computational difficulties associated with its use, dynamic programming 
is seldom applied to numerical solution of optimal control problems. 
Instead algorithms based on the minimum principle are used, even 
though these algorithms may converge to extremal/ rather than optimal, 
solutions. Nevertheless, these algorithms, when combined with appro- 
priate engineering judgement, have been found to produce solutions 
that are often highly superior to those developed on the basis of 
intuition alone* An indication that the min-H algorithm developed in 
Chapter 3 can play a similar role for the FSFM problem is provided by 
the analysis in Chapter 6* 
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4.3 Example 

In this section, the two approaches to the numerical application 
of the dynamic programming algorithm developed in the previous 
section are applied to the second example of section 3.2. Only the 
backward equations are illustrated. 

Recall that the example has state sets 


X Q « {1,2} , X = {1,2, 3, 4,} , x 2 = {1,2, 3, 4} and 

control sets = {0,l} , = {o,l} . The parameters of the 

equivalent deterministic problem are (0 < k < 1) : 


h (1) 



7T 


0 


1 1 
2 2 


— • 

0 

II 

CM 

w 

-0“ 

0 

0 


1 

0 


1 

k 


0 


P^O) = 


10 0 0 
0 0 10 


p°(0) = 


p 1 (l) = 


0 10 0 
0 0 0 1 

10 0 0 
10 0 0 
0 0 10 
0 0 10 
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Attention is restricted to the case in which 
X X 

r x = u 2 °, but r 2 = {y 2 e u 2 0 : y^l) = y 2 (2) = y 2 (3) } . 

The reachable set r^tTl^) has four elements: 

r i‘V -j[i 0 0 ?] ' [? 0 7 °]' [° 7 0 ?} [° 7 I °] 

The function 

[ Y Y 

Trh 2 (2) + TTP 2 (2) (j> 

_ _ 

must be evaluated for each 7r e r^(7T^) f and a corresponding minimizing 
control law tabulated. 

The result is 

V*> = V ' 


where y 2 *(x) = 1 for x = 1,2,3, Y 2 *(4) = 0, and 

v i ([f 0 0 ?]) - I k ' 

v 2 ([? 0 7 0 ] ) ' 7 - 
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where y 1 *(l) - 1, This is in agreement with the results 

obtained earlier. 

The solution is now recomputed using the Sondik algorithm. The 
algorithm starts with 
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It is obvious that the first and third vectors of A^ are not required 
for A^ . Let the arbitrary point 7r=(10 0 0]eII^be selected. 

Performing the minimization 

Y Y 

min tt ( h 2 (2) + P 2 (2) cf. ) 

Y„£l\ 2 

1 2 2 

gives Y 2 1 (4) = 0, (1) = y^ 1 (2) = Y^O) =1, and 

a 1 (1) = 

Checking the inequalities that define (1) , 

Y Y 

Trahl) < (h 2 (2) + P 2 (2)<P 2 ) 



for all Y 2 e r 2 
^(1) = 


, it is verified that 
Or : Tr 3 <_ TT + 1? 2 > 


and that 


a 2 <l) 


1 

1 

0 

k 


where 


2 


Y 


= 0. The region R 2 (l) is defined by the inequalities 

Y Y 

77a 2 (1) <■ 77 (h 2 (2) + P 2 (2)<fi 2 ) 
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for all Y 2 


so that n 


For t = 0, 


is Yj 1 (1) 


The region 


for all Y 1 


with 


e I* 2 ' Since checking the inequalities shows that 

R-(l) = {7r : IT, > TT + TT } 
z -5 12 

= r^d Ur ^(1) , the algorithm gives 


A i = - 


choose 7T - [1 0] . A control law minimizing 

Yj. 

min TTP (1) a(l) 
a (1) eA 


1 , Yj^ (2) » 0 with 


*(0) = P 1 (1) a 1 (1) = 


R^(0) is defined by the inequalities 


ira 1 (0) < p 1 (l)a(l) 


£ q, a(l) e A computation gives 


R 1 (0) = {tt ; kir 2 <_ tt^} 


a 2 (0) = 
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Since tt (1) e (1) , is the optimal control for t = 2. This is, 
of course, in agreement with the earlier computations. 
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CHAPTER V 


THE INFINITE HORIZON FSFM PROBLEM 


In this chapter, time -invariant FSFM models operating over 
the infinite time horizon are studied. The infinite horizon model 
provides a useful approximation to problems with a finite, but 
distant and possibly unknown planning horizon. 

The cost criterion studied is the expected discounted cost. 

For this criterion, the Value and Policy Iteration methods of 
Howard [Howl] and Blackwell [Bll] can be extended to the FSFM 
problem. Moreover, the algorithms of Sondik [Sol] implementing 
these methods can also be extended to the FSFM problem. 

The chapter concludes with an example illustrating the solution 
of a simple FSFM problem by the Policy Iteration method. An 
important conclusion that can be drawn from the example is 
that the optimal control law sequence for an infinite horizon 
FSFM problem will be non-stationary in general. 
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5.1 Formulation 

In this chapter, attention is restricted to time invariant 
FSFM models of the form 

x(t) = f (x(t-l) ,u(t) ,q(t) ) ( 5 . 1 . 1 ) 

defined for t = 1, 2,... where the state sets X = X , the control 

t OO 

sets = U ro , and the uncertainty sets Q = are all independent 

of time. Moreover, the probability function p^_ = p^ on = 0^ is 

assumed time invariant, and the sets T = T of admissible control 

t 00 

functions are assumed constant. 

Let the sets X and F be defined as follows 


X = X Q x X^ x X 2 x ... (5.1.2) 

r = r i X r 2 x r 3 x ... (5.1.3) 


For each y - (Y^,Y 2 '^ 3 ' ♦ • • 5 £ F, a sequence of matrices P t is 
defined by 


F fc = s j = f(i,Y t (i) ,q)l) . 

ij 

The matrix P can be interpreted as a transition 


is, for each i £ X^_ ^ , a probability measure on X^_ 


(5*1.4) 

probability. That 
is defined with. 


P equal to the probability of j e . Since tt defines a probabil- 
- L J to 

ity measure on X Q# the theorem of Tulcea [Lol] can be invoked to 
establish the existence of a unique probability measure 
on (X,X) (X - F(x) ) satisfying 
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V T (A 0 x A lX ...xA t X X t+1 x X t+2 x...) 


I ( 0 ) l 
l o £A o 0 V A i 


Vl • „ Vs 

l 2 “2 




1 t“l 1 t 


(5.1.5) 


In particular, from (5.1.5) it follows that 


V(X 0 x X x X...X X t _ x X A x x X t+1 x X t+2 x...) 


tt"^. (t) , 


i t £A t 


where the sequence TT^(t) is defined inductively by 


(5.1.6) 


ir 1 co) = o , 


Y. 


7T (t) = IT' (t-1) P , t = 1,2,..., 


(5.1.7) 

(5.1.8) 


where y = {y ,y , . . .) 


Defining the cost J of operating the system is a delicate problem. 
One approach might be to define 


j = £ h (x(t-l) ,u(t) ) . 

t=l 


(5.1.9) 


This approach suffers from several defects. If h(x(t-l) ,u(t) ) > 0 
for all x(t-l) e X , u(t) £ U , then J = +°° for every control, and 

-L t 
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the cost criterion is useless. If h(x(t-l) ,u(t) ) > 0 and 

h(x(t-l) ,u(t) ) = 0 for some x(t-l) £ x^ ,, u(t) e n , then the 

t— 1 t 

cost criterion is still infinite useless the non-zero cost states 
occur only finitely often. This case might be of interest in some 
situations, but is clearly rather special. Similar comments apply 
if the direction of the preceeding inequalities is reversed. If 


the function h is allowed to assume both positive and negative values , 
then there is no assurance that the summation in (5.1.9) is well 

/ oo \ 


defined 


(-l) t = 


t=l 


A second approach is the definition 

T 

J = lim I h (x (t-1) ,u (t) ) (5.1.10) 

T+® t=l 

of the average cost per unit time. This cost is never 
infinite; if 


then 


sup sup jh (x,u) 

xeX ueu 
00 00 


j| < k. 


(5.1.11) 


(5.1.12) 


However, J need not be defined for all sequences x(t-l) , u(t) . Suppose 
that = {o,l} , h(0) = 0 and h(l) = 1 (independent of u) . 

Then J is not defined for the sequence 
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1 s 2n-l - 4 ' s 2n 
0 s 2„ i ‘ s 2n+l 


where n = 1,2,3,... and the sequence is defined by 



s , (n+1) £ s , , n 1,2,... 

HtI . _ 1 


For this sequence, 

1 T 

lim sup ^ £ h(x(t-l) ,u (t) ) = 1 

T t=l 


but 


1 x 

lim inf — £ h (x (t-1) ,u(t) ) =0. 
T + °° t-1 


(5.1.13) 

(5.1.14) 


(5.1.15) 

(5.1.16) 


(5.1.17) 


(5.1.18) 


A third approach is the definition 

CO 

J = [ ^ 1 h(x(t-l) ,u(t) ) (5.1.19) 

t=l 

where 0 < 3 < 1 is the discount rate . Discount factors occur naturally 
in an economic context when the present value of a stream of future 
earnings must be determined [Anl] . In other contexts, £ can be regarded 
simply as a convergence factor used to achieve an approximation of the 
average cost (5.1.10).^* Existence of J is assured. Let k be defined 
as in (5.1.11), then 


Ross [Rol , Ro2 ] and Mine and Osaki [Mil] consider the limit as 6^1. 
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= k 


1 


(5.1.20) 


£ B t |h(x(t-l) ,u(t) I <_ k 
t=l 


I e 


t-i 


t=i 


so that the series defining J converges for any sequence of states and 
controls . 

The definition (5*1.19) will be adopted in this chapter, although 
some further remarks concerning (5.1.10) will be made. 

It is necessary to define the expected value of J to state the 
infinite horizon version of the FSFM problem. By (5.1.20), the 
summation 

oo 

J Y = I 6 t_1 h (x(t-l) ,y (x(t-l) ) ) (5.1.21) 

* t=l 

exists for all 


y = (y^y^#**-) e r 

and sequences 

(x(0) ,x(l) ,x(2) , . . .) ex. 
Therefore (5.1.21) defines a map 


X -> R (5.1.22) 

that is automatically measurable with respect to X - P(x) . Moreover, 

E Y k,i - / x k Y <*) Idv^ix) 

i/x 

so that is integrable. Therefore, the functional 


(5.1.23) 
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J(Y) = E J = / x J Y (x)dV Y (x) (5.1.24) 

can be defined- The infinite horizon FSFM problem with discounted 
cost criterion is then to minimize J(y) , for all yeT . 

Since the sum (5.1.21) is defined for all 


(x(0) ,x(l) ,x (2) , ...) e x, 

and since the bound (5.1.22) holds, application of Lebesque's dominated 
convergence theorem (see, e.g., [Rul] or [Sel]) 
shows that 

E y i l ^ t ~ 1 e y h(xCt-l) ,Y t (x(t-l))) 

(t=l 

00 / 

= l e t_1 E |h(x(t-l) # Y t (x(t-l))) 

t=l Y l 

00 y 

= l B t_:L 1T(t-l)h (5.1.25) 

t=l 

for all control law sequences y - {Y^»Y 2 ' • • •) (possibly non-stationary) 
where 

TT (0) = fr 0 , (5.1.26) 

7T ( t ) = Tf(t-l) P , t - 1,2,... (5.1.27) 

Therefore the infinite horizon FSFM problem with discounting is 
equivalent to the deterministic problem of minimizing (5.1.25) subject 



to (5.1.26) , (5.1.27) . 
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To apply the method of dynamic programming, the problem defined 
by equations (5.1.25) - (5.1.27) is imbedded in a series of similar 
problems. Define functions 


oo y 

T-t y ^Jt+1 


/ (ir> = inf £ p Z 7T (T-l)h 
T=t 


(5.1.28) 


for t .> 1 , where Y = (Y^ »Y 2 t • ■ • ) and 


tt 1 (t-1) = IT, 


TT Y {t) = TT^(T-1)P T , T 


> t. 


(5.1.29) 

(5.1.30) 


(Notice that the summation in (5.1.28) is independent of the first 
t-1 components of y.) 

Lemma 5.1.1 

For all s, t > 0, for all TT e IT = H , 

— t 00 ' 


V (it) = V (IT) = V (TT) 
S t 


(5.1.31) 


Proof 


Suppose that t > s. Then the control law sequences 


A /\ 


Y Vi-VW -- 4 


and 


Y (Y 1 #...#Y s »Y s+ 1 r...) 
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satisfy 


l 6 T_t TT Y (T-l)h 
T=t 


= £ £5 T ~ S 7r Y (T-l)h T . 

T=S 


(5.1.32) 


Therefore, the infimum of the above sums must be equal. 


Lemma 5,1.2 


The functions ( • ) satisfy the sequence of optimality equations 


\ + i zT t + t 


t+i 


't+i. 


V (tt) = min <7Th + 3v (ttp ) 


t+1 


(5.1.33) 


Proof 


V (IT) = 


inf £ 6 T V(T-l)h T 
yer 


T=t 


inf 

yer 


Yt+1 x-t-l Y 

TTh X ^ Y (t-l)h 


min 


Y t+i eI> t+l 


Y t+1 a 
TTh +3 


inf 

v rF Y £.V 
T t+2 t+2, Y t+3 t+3.. . 


I B^^IT-Dh T 

T=t+1 


= min 
Y t+l eI t+l 


Y t+1 

TTh + 


Y t+1 

BV t+ i(TTP t+1 ) 


(5.1.34) 
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Lemma 5.1.3 


The function V(*) is the unique bounded solution of the equation 

Y to Y 0 

V (tt) = min TTh °° + £v(ttp ° )} (5.1.35) 

Y er 

1 oo oo 


for t arbitrary. 


Proof 

By Lemmas 5.1.1 and 5.1.2, V(-) is a solution of (5.1.35). Since 

I I S T '‘ ^(T-l)h YT+1 | 

T=t 


< i |iT Y (T-i)h YT+1 | 

T=t 


00 



T=t 


k 

l-B ' 


(5.1.36) 


it follows that V(*) is bounded. Therefore, it is enough to show that 
(5.1.35) has a unique bounded solution. 

Let B (11^) be the set of bounded real-valued functions on With 

the norm 

M v x - V I I = sup |v (TT) - V Or) I , (5.1.37) 

irelL 

00 

® ■'■ s a Banach space [Rul] . Define the operation 


A : B(HJ +B(IIJ 

v -*• Av 


(5.1.38) 
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by 


Av(TT) = min { TTh + 6v(TFP ) 

Y er 

( CO o 


Then the problem is to show that the equation 


AV = v 


(5.1.39) 


(5.1.40) 


has a unique solution. 

To show that (5,1.40) has a unique bounded solution it is suf- 
ficient by the Contraction Mapping Theorem for Banach spaces [Sil] to 
show that A is a contraction. But A is a contraction by the following 
argument due to Blackwell [Bll] . Notice that 

A(V + c) = AV + 6c (5.1.41) 

for any constant c, and that <_ 1 implies 

AV £ AV 2 (5.1.42) 

Let c = | | V - v 2 | | , then 

v ! £ v 2 + I I v ! - v 2 ! I (5.1.43) 


implies that 


hV 1 ^ A(V 2 + 


V 1 " V 2 1 = AV 2 + 6 H V 1 " V 2 H* 


By a symmetrical argument, 

AV 2 i Av x + ^ I l v x - V 2 I I 


1 . 

i.e. , 


V 1 (ir) <_V 2 (tt) for all Trell^. 


(5.1.44) 



122 - 


so that 

||Av 1 - Av 2 || < 6 ||v 1 - v 2 ||. 


(5.1.45) 


Theorem 5.1.4 

Let the map 6* : be defined for t = 
if is a minimizing control law in (5.1.35) . 
1t*(t) , Y t * by 


0,1,2,... by 6* (7T) = 
Define the quantities 


Tf*(0) = TT 0 , 


(5.1.46) 


Y t * = 6*0r*(t-l)) , t > 1, 

V 

It* (t ) = TT*(t-l)P , t > 1. 


(5.1.47) 

(5.1.48) 


Then y* “ (Y^*fY 2 *' * ' 311 optin' 3 ! control law sequence with 

corresponding sequence of optimal states. 


Proof 


By (5.1.33) and the definition of Y t+ ^* , 

{ Y t+1 Y t+1 

V (TT*(t)) = min {ir* (t)h + Qv (TT*(t)P x ) 

t y er I t+x 

T t+1 t+1 1 


t+1 

= 7T*(t)h ^ + 6v (7T*(t+l) ) 

t+x 


(5*1*49) 

for t = 1,2,.„. Therefore, a simple induction argument establishes that 

t+1 Y * 

W = l $ X ~ 1 TT*(T-l)h T + e t+1 V t+1 (TT*(t+l) ) (5.1.50) 


T=1 
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for all t. Recall from (5.1.36) that V (ir* (t+1) ) is bounded, 

V t+ i(Tr*(t+ 1 ) <_ . (5.1.51) 

Therefore , 

t+1 T , Y* t+1 

£ g T i TT*(T-l)h + 3 V t+1 (tt * (t+1) ) 

T =1 

00 Y * 

= l 6 T_1 tt* (t— 1) h 1 (5.1.52) 

T=1 

which was to be shown. 

To employ Theorem 5.1.4, it is necessary to compute the function 
V. Since A is a contraction, V can be computed iteratively by a 
successive approximation approach. Define the sequence £ B(tt), 
n ” 0,1,2,..., by 

V = Av (5.1.53) 

n+1 n 

where is an arbitrary element of B (t) . Notice that the notation 

is ambiguous since it does not distinguish between the forward sequence 

V and the backward sequence V . V is often referred to as the 
t n n 

"cost with n periods remaining" in the literature on Markovian decision 
procedures [Sol] , but this is not meaningful strictly, because there 
is never a finite number of periods remaining in an infinite horizon 
problem. Similarly, the statement that the sequence of control laws 
with n periods remaining is to be determined is not logical. What 
control law will be used at the first stage? 


w ■ lim 

X » 
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Lemma 5.1.5 


The sequence defined above converges to V* 
(i) ||A n v 0 - vj| < e n ||v 0 - vj| , 


Moreover , 


(5.1.54) 


(ii) ||A n v 0 - v| | ||Av 0 - v Q | 


(5,1.55) 


Proof 

Since A is a contraction (Lemma 5,1.3), Lemma 5,1.5 is an immediate 
consequence of Theorem l.XVI of Kantorovich and Akilov [Kalj . 

Notice that (5,1.55) gives a bound on the error at iteration 
number n that is independent of V and can therefore be precomputed. 

The method of determining the optimal control law by the iteration 
(5.1.53) is referred to as Value Iteration [Howl] in the literature on 
Jterkovian decision processes. This is in distinction to the method of 
Policy Iteration introduced by Howard [Howl] and extended by Blackwell 
[Bll] . The policy iteration method can also be extended to the FSFM 
problem, as will be demonstrated next. 

For any policy 6 : 11^ ■+ , define the sequence of functions 

r 

V : II ■+ R by 

n co 


V Q (7T) = 0 


(5.1.56) 


.^(ir) +v « Jii) 

n+1 n 


(5.1.57) 
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for n = 0,1,2, Let the operator 

A 6 : B(nj - B(IIJ 

be defined by 

(A 6 V) (TT) = 7Th 6(7r) +V{TTP 6(7T) ) . (5.1.58) 

Lemma 5.1.6. 

For any policy 6, the operator A is monotone; i.e., if 

r ^ 

V < V , then A V <_ A V 

X ^ Z 

Proof 

The property follows immediately from (5*1.58) ♦ 


Theorem 5,1.7 (Howard Policy Iteration) 

/V. £ ^ ^ 

Let the policy o be defined by o(TT) = y , where y satisfies 


iTh 


y + gv 5 (uP Y ) = min irh t + (ttp^) 


(5.1.59) 


y er 

T t t 


for some arbitrary policy 5 . Then 


v (tt) < (tt) 


(5.1.60) 


5 A 5 ^ 

for all tt ell . (V = lim V exists since A is a contraction.) 
oo n 

n ^oo 
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Proof 

The method of proof is due to Blackwell [Bll] . 
By (5.1.57) 


A 6 v 5 < A V = V S . 


Since is monotone, 


aV 


V 


A 6 v 6 


< v 


(5.1.61) 


(5.1.62) 


and so by induction 

(aVv 4 <v ! . 

/V 

£ 

Since A is a contraction (with modulus 6) , 

lim (A^) " V { = . 

n-*-°° 


(5.1.63) 


(5.1.64) 


Before turning to the important question of using Value or Policy 
Iteration as a numerical technique, it is appropriate to comment on the 
alternative definition (5.1.10) of J. 

As mentioned earlier, J may not be well defined by (5 *1.10), so 
that the formal computation 


lim 

T 


1 

T 


T 

l h (x(t-l) ,y (x(t-l) ) ) 
t=l 


= lim 

T oo 


Ey | h (x (t“l) ,Y t (x(t— 1) ) ) 
t=l * 


! 

! 


(5.1.65) 
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is not valid in general. Moreover, the limit on the right hand side 
of (5.1.65) need not exist. This has led many authors [Kul,Del , 
Mil) to adapt the definition 

T 

J (y) = lim sup — J E { h(x(t-l) ,Y t (x(t-l) ) ) } . 

m a. (T. * 


The limit in (5.1.66) always exists since 



(5.1.67) 


for all T. 

The disadvantage of the definition (5.1.66) is that it is dif- 
ficult to give a meaningful interpretation to the functional J(y) . 
However, if attention is restricted to the class of stationary control 

laws (y = y for all t) , then a natural interpretation of J(y) is 
t °° 

available . 

For stationary control laws, 



T A p Y 


(5.1.68) 




Y 


for all 1 < t, T < «*. Moreover, by (5.1.6) 


(5.1.69) 


{h(x(t-l) ,Y t (x(t-l))) > = TT Y (t-l) h Y 


= TT Q (P Y ) t V. 


(5.1.70) 
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But Theorem 2.1 of Doob [Dol] (Chapter 5) states that there exists a 

y 

stochastic matrix P ' such that 

OO 

T 

lim i l (P Y ) fc = P ro Y (5.1.71) 

T-*- <» t=l 


Therefore , 

1 T 

lim ^ l E v {h(x(t-l) ,y. (t-1))} = ir Y h Y (5.1.72) 

T-^co 1 t=l Y ^ 


where 

** = T^pjf (5.1.73) 

defines the long-run distribution of the Markov chain. Thus, 

Y Y 

tt h is the expected cost per transition under the long-run distribution 
induced by the stationary control law sequence y. 

Although the average cost criteria has been studied extensively for 
problems with finite state space [Howl, Kul, Del, Mil], there are few 
results available that apply to continuous state spaces such as JI^. 
Therefore attention is restricted in the sequel to the expected dis- 


counted cost. 
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5.2 Numerical Solution of the Functional Equation 

In the previous section, it was demonstrated that solving the 
infinite horizon FSFM problem with discounted cost criterion was 
equivalent to solving the functional equation 

Av = V . (5.2.1) 

Two theoretical methods. Value Iteration and Policy Iteration, were 
described for the solution of (5.2.1). 

The solution of functional equations similar to (5.2.1) is a 
classical problem of dynamic programming. Although numerous suggestions 
have been advanced (See [Bl] , [B2] , [B3] , for example), no satisfactory 

general algorithm has ever been found. However, Sondik [Sol] has 
recently developed an algorithm utilizing policy improvement for 
solution of Markovian decision processes with incomplete state inform- 
ation. Since the functional equation that arises in the solution of 
these processes is similar to (5.2.1), Sondik's algorithm can be 
applied to the FSFM problem, as will be demonstrated in this section. 

The most straightforward approach to the solution of (5.2.1) is 
to use Value Iteration. Starting the algorithm with V^ = 0, then 

v = Av (5.2.2) 

n+l n 

can be obtained using the backward Scndik algorithm as in Figure 4.2.1. 
The difficulty with this approach is that the algorithm must be 
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iterated until the factor is small enough to insure that 

approximates V to the desired accuracy. For discount rates B close to 
one, this requires a large number of iterations. But the Sondik 
algorithm is practical for only a small number of iterations, in 
general. 

A second approach to the solution of (5.2.1) is to use Policy 
Iteration. Policy Iteration consists of two steps: value determin- 

ation and policy improvement [Howl] . These steps are illustrated in 
Figure 5.2.1. One might conjecture that the Sondik algorithm could 

be utilized to carry out both steps. However, neither V nor V° 

n 

is in general piecewise linear and concave, as is required for the 
Sondik algorithm. These same difficulties arise in the partially 
observable Markovian decision problem, and Sondik has developed an 
alternative approach [Sol], As will be demonstrated next, this 
approach can be applied to the FSFM problem also. 

The value determination step of the policy iteration algorithm 
requires that the functional equation 

A 6 V = V (5.2.3) 

be solved to determine V . For a special class of policies , this 
equation can be readily solved. 

Lemma 5.2,1 

Suppose that has n elements so that the (row) vectors in 
n o> axe n-dimensional. Then for every policy 6 : H + T . there 

00 oo r 


exists a map 
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Figure 5.2.1 Policy Iteration {After [Howl], Figure 7.1) 
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a : n -*■ a , 


(5.2.4) 


where A is the set of n-dimensional column vectors, such that 


6 6 

V (TT) = ira (it ) 


(5.2.5) 


for all IT £ II 

oo 


Proof 


Recall that V is the unique solution of the functional equation. 


6 .6 

v = A v 


- TTh 5 ™ ♦ 6V' S (,IP 5(, ' ) ) 


(5.2.6) 


Therefore, the representation (5.2.5) is valid if the functional 


equation 


a s m - h 6 ™ + SP S(,r, a S (ll) 


(5.2.7) 


has a solution. Recall that the space of bounded maps from 
n OT to A, B (II ; A) is a Banach space with the norm 


j (a (it) 1 | = sup | | a (it) | | £ 
Trell 

where 


max | a . | 
l<i<n 1 


(5.2.9) 


Moreover , 


|h 6( ‘ } + 6P 6( * ) a 1 (*) - h (0 - 6P <S( * ) a 2 (‘) 
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6 . . 

= 8 | |P (,) (a 1 (‘) - a 2 (*)) || 

< 6 I ja^*) " a 2 (‘) j | 


so that a unique solution to 5.2,7 exists by the contraction mapping 
theorem . 

The solution to (5.2.7) will not necessarily be piecewise— linear 
and concave. Suppose, however, that there are sets R^,R 2 ,...,Rj 
that partition , and satisfy the following two conditions: 

(a) for each it e R. = Y 

i y i 

(b) each IT E R , satisfies ttp £ 


Then (5.2.7) reduces to the system of equation 

a. = h7 + 8 p Y a (5.2.11) 

i v(i) 

for i = 1,2 , . . . , j . The existence and uniqueness of a solution to 
(5.2.11) can be established by a contraction mapping argument similar 


to those used previously. 

Of course, the reduction of the infinite dimensional equation 
(5.2.7) to the finite dimensional system of equations (5.2.11) is pre- 
dicated on the assumption! of an appropriate partition of 11^, Sondik 
has found a class of policies for which this assumption is valid [Sol] . 
To define this class, it is necessary to define some notion. Let 


T S M 


TTP 


<S(TT) 


: TT £ A 


( 5 . 2 . 12 ) 
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S 6 n = TgtS 11 ’ 1 ) , n >_ 1 , 


S 6 " n 0 


(5.2.13) 

(5.2.14) 


Dj. = closure {it : i is discontinuous at it}. 


(5.2.15) 


A policy 6 is finitely transient if and only if there is an integer 
m such that 

D 6 ^ S 6™ = * (5.2.16) 


(<f> - null set) . 

In his thesis [Sol] , Sondik establishes two important properties 
of finitely transient policies: 

(a) A partition with the desired properties 

exists if 6 is finitely transient , and 

6 

(b) The function V may be approximated arbitrarily closely 

A 

6 

by V , for some finitely transient policy 6. 

Since Sondik 1 s arguments are readily applied to the FSFM problem, they 
are not repeated here* However, an example (Figure 5.2.2) is given 
to illustrate the basic idea. 


For the example. 
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p 6(7r) = 



TT. 


< 


1 

2 



Notice that is completely characterized by its effect on 

since "IT = 1 - TT . The sets D n are defined by 


D = D. 


D n+1 = {tt : irp 5(Tr ^ e D n }, n > 0. 


(5.2.17) 

(5.2.18) 


. 6 (it) 6 (tt) 

Since TIP e R for tt e R , TTP e R for tt G R , 

^ -L 3 2 

6 (tt) 3 3 

and ttp £ R for TT e R , it follows that V(l) *= 2, V(2) = 3, 


and V (3) = 3. 


The procedure of constructing the regions R from the sets D n 
applies to any finitely transient policy. Moreover, an approximate 
partition can be constructed for an arbitrary policy. The reader 
is referred to Sondik [Sol] for details. 

Although the value determination operation of Policy Iteration 
can be (approximately) carried out with the use of finitely transient 
policies, there remains the problem of implementing the policy improve- 

r 

ment routine. The difficulty here is that the function V° is not 
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piecewise linear or concave as is required for the Sondik algorithm. 

<5 

But V will be piecewise linear if o is finitely transient, and 

6 

Sondik has shown that the concave hull of V can be used in the 
policy improvement routine [Sol] . Again , Sondik* s thesis [Sol] should 
be consulted for details * 

In this section, the numerical solution of the functional 
equation (5.2.1) has been considered. The emphasis has been on 
showing that an algorithm recently developed by Sondik for partially 
observable Markovian decision processes can be adapted for the FSFM 
problem. It should be pointed out that a FSFM problem will be con- 
siderably more difficult to solve than a corresponding partially 
observed Markovian process. This is due to the fact that the FSFM 
problem requires that the observation and memory sets be included in 
the state set, and since the policies in the FSFM problem assign a 
control law to each state. Thus, a given FSFM problem will be 
much larger than an analagous partially observed Markovian decision 
process. Although this advantage is offset somewhat by the simple 
form of (5.2.1) (relative to the partially observed problem) , it is 
nevertheless true that the technique outlined in this section is 


feasible only for simple special cases of the FSFM problem. 
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5 * 3 Example 

In this section, a simple infinite horizon FSFM problem with 
discounted cost criterion is solved using the policy iteration 
algorithm outlined in the previous section* The solution illustrates 
that in contradistinction to the usual discounted infinite horizon 
Markovian decision process, the optimal control law sequence can be 
non-s tationary * 

The problem has = {l,2} and contains only the con- 

trol law whose value is always 1, and the control law whose value is 
always 2. The parameters of the problem are 


P 


1 


1 ^ 1 

2 2 

1 1 

2 2 



1 I 

4 4 

1 1 

2 2 



V 


‘ 3 

e = j h 1 = 


11 

CN 

A 

32 


1 


35 




32 _ 


Take the starting guess 6 (tt) - 1. Since 6 has no discontin- 
uities, it is certainly finitely transient* The algorithm is carried 
out below. 

Policy Evaluation 

Since 6 has no discontinuities. 
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where 


1 

a 



ep 1 a 1 



4a l = a i + a 2 


4a 2 = 4 + a i + a 2 


Therefore , 


1 

a = 


Policy Improvement 

/\ /s 

Let 6 (it) = y, if 


* £ -f 

7rh Y + gv 6 (TTP Y ) = min irh Y + gV° (7TP Y ) 


Since V (tt) = ira^, this is equivalent to 


TFh 


i Y + BttP Y a 1 = min TTh Y + $TO Y a 1 


Since V has only two elements, it is only necessary to check 
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uh 1 + 6itP 1 a 1 < Trh 2 + 6TTP 2 a 2 
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Test for Convergence 

To test for convergence, it is necessary to check 
V 6 (tt) = min 7Th Y + 6V 5 (iTP Y ) 

yer 

for all -rrell. 1 
Case 1 ir e 


But 


6 , . 1 

v (tt) = Tra 

1 ^ l 2 0 2 1 

ira < frh + £Wtp a 


for TT £ by the computation above. 


Case 2 IT £ 

A 

„6 . , 2 

V (it) = TTa 

2 1 11 

But TTa <_ TTh + frrrp a as above. 

Therefore the policy iteration algorithm has converged in a single 
step. 

Notice if TT(O) = (10), then 6 (TT (0) ) = 2. However, TT(1) = 


■(H). 


so that <$(tt( 1)) = 1. Moreover, tt( 2) = ^ - Ti(t) for all t > 2 f so 

that 6 (tt ( t) ) = 1 for all t >_ 2. Thus , the optimal control law 
sequence is non-stat ionary ! 


In general, a criterion such as 


sup [ V^(tt) - min TTh^ + (TTP^) | <_ e 

ttcTI ver 

00 1 


would be checked* 



CHAPTER VI 


EXAMPLE : HYPOTHESIS TESTING WITH 1-BIT MEMORY 

In this chapter, a problem of sequential hypothesis testing with a 
1-bit memory is considered. The problem is not of enormous intrinsic 
interest, although substantial work has appeared in literature [Col,Hel, 
Fll , Chal, Co2, Hil] . However, the problem illustrates the use and 
limitations of control-theoretic methods in the design of information- 
handling systems. 

6.1 Introduction 

In the first five chapters of this thesis, some of the most important 
theoretical and algorithmic results of modern optimal control theory 
have been applied to the FSFM problem. It has been pointed out that some 
of the crucial memory management and communication tasks of information 
handling systems can be examined within this format. It is felt that 
the establishment of this framework is a contribution of this research. 

In the previous chapters, simple examples have been given to illustrate 
the use and properties of the theorems and algorithms derived. In this 
chapter, a more substantial example is studied. 

Of course, the FSFM framework is not the only way that information 
handling problems can be treated. In particular, both information 
theory [Shi] and the theory of formal languages [Hopl] deal with this 
important issue. 

The relationship between information theory and non-classical 


142 - 



143 


stochastic control theory (which includes the FSFM problem) is clarified 
by the following statement of Witsenhausen [W3] : 

"The latter [information theory] deals with an essentially simpler 
problem# because the transmis s ion of information is considered independently 
of its use, long periods of use of a transmission channel are assumed/ 
and delays are ignored". 1 

Thus in information theory/ one does not usually pose the question: What is 

the best code of block length n for a given source and channel?". Instead/ 
one asks "For a given source and information channel/ what is the best 
that a code of block length n can do? 11 . Of course/ the bounds obtained 
as the answer to the latter question throw considerable light on the 
former , and thus information theory has had considerable practical 
impact. Moreover, obtaining an answer to the first question seems 
computationally impossible. But the first question is still of 
importance, and it is of interest to examine a framework in which the 
question can be raised, even if it can't be answered. The hypothesis- 
testing problem considered in this section is of the same nature as the 
first question, but is considerably simpler. Thus the problem of this 
chapter is studied for paradigmatic rather than pragmatic purposes . 

The theory of formal languages deals with more qualitative questions 
then the quantitative optimization considered in the FSFM problem. Thus 
a typical question in formal language theory is "What class of languages 
is accepted by a particular class of finite state machines?". This 

^n exception to the last point is the recent paper of Krich and Berger 
[Krl] . 
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question turns out to be intimately associated with the problem considered 
in this chapter. 
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6.2 Formulation 

Suppose that {x.^} is a sequence of independent, identically 
distributed random variables with 


pU^l) = p, p (x ± =0) = q. 
The hypotheses 


( 6 . 2 . 1 ) 


H, 


( 6 . 2 . 2 ) 


H, 


P = P n 


(6.2.3) 


are to be tested against one another. A Bayesian viewpoint is adopted, 
so that there are a priori probabilities for H Q and ^ for H 1 
( ^ = 1 ) , and the cost criterion is the probability of error. 

If x. , i = 0, 1, ..., T-l, is observed, and the decision is based on 
these observations, it is well known that the optimal decision is a 
likelihood ratio test [Val]. Moreover, a sufficient statistic is the 
number of ones (on zero's) observed [Lil] • Storing this number requires 
a memory with no more than log^ T bits. 

An alternative formulation is to assume that only a given memory with 
less than log 2 T bits is available. For example, suppose that only one 
bit is available. Define the sets 


X = {o,i> 

with corresponding fields 


(6.2.4) 


X = {$, (0>, (l), X t > 


(6.2.5) 
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for t = 0, 1, 2, . .., T-l. The memory sets are 


= { 0 , 1 > 


( 6 . 2 . 6 ) 


with corresponding fields 


M t = {$, {0}, {1>, M fc > 


(6.2.7) 


for t — 2 , » • • § T. A set 


U = {0,1} 


( 6 . 2 . 8 ) 


of terminal decisions is also given. 

Let t = 1, 2, ..., T, and y t denote functions 


1o 5 x o - M 1 


(6.2.9) 


IT : M x X , -► M , t = 2, 3, ..., T (6.2.10) 


: M ■+ U. (6 .2.11) 

T T 

The functions are the memory update functions, and the function y^ is 
the terminal decision function . 

Given n l r r\ , . let 


n t : X Q x X 1 x . . . x X M 


be defined as follows. 


n i s x o - M 1 

x Q - n 1 (x Q ) 


( 6 . 2 . 12 ) 
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and 


n t : X 0 x Xi x 


( Xq > X x t • • 


. X X t _ x + M t (6.2.13) 

x t-l ) ^ \ ( \-l (X Q' X l' •*" X t-2 } ' X t-1 ) 


for t = 2, 3, ..., T. Hp is the memory structure induced by the memory 
update functions r^, . .., r^. 

Define the product space X and product field X by 


X = X Q x x ... x X T-1 , (6.2.14) 

X= ^ x X L x ... x X p _ 1 . (6.2.15) 

Then is a map from X to Define 

Y T : X 0 x X 1 x . . . X 3? t _ 1 -»■ U (6.2.16) 

as the composition of y t and ri^, 

Y t - Y t o (6.2.17) 


Each hypothesis induces a probability denoted by p(* | H^) or 
p(* | H^) on X. (This probability is completely specified by the 
condition that the probability of a point with m ones and n zeros 
is p m (l-p) n where p = p Q or p - p^^ according to H = H Q or H = H^) . 
Define the subsets and of X by the equations 


s o = V 1(0) - 

(6.2.18) 

- -1 

S-l - Y t (1) 

(6.2.19) 
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The probability of error corresponding to is 

p e ( V = P(S 0 I ^ \ + p(s i I H 0 } X o (6.2.20) 

where ^ = 1. ( A ^ is the a priori probability that H Q is true.) 

The problem of hypothesis testing with 1-bit memory considered in 
this chapter is to find 


m i n p e ( V 

V 

T T 


( 6 , 2 . 21 ) 


and the functions , \^ t y^ defining the minimizing Y T . 

Several problems closely related to (6.2.21) have been considered 
in the literature. Cover [Col] has considered the preceding problem for 
the limiting case T + Heilman and Cover [Hel] have considered the 
infinite time problem when attention is limited to time invariant, but 
possibly randomized, memory updates. Flower [Fll] has considered the 
finite time problem but with attention again restricted to time 
invariant, although possibly randomized, memory updates. 
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6.3 Preliminary Analysis 

For fixed the problem is easily solved. Let 


f t (v - v 1 ^ 

(6.3.1) 

= {<{>, E, i, x} 


where 


e = n r " 1 (i) 

(6.3.2) 

i = n r " 1 (o) 

(6.3.3) 

is the information field induced by the memory structure. 

Then the 


Bayes optimal decision is determined by the condition that the a 
posteriori probability be maximized [Val] . Therefore, if - 1, choose 
H Q (Y t ( 1) = 0) if 

p(E | H Q ) X Q > p(E | X ± (6.3.4) 

and choose (y t ( 1) = °) if 

p(E | Hj) > p(E | H Q ) A q . (6.3.5) 

Similarly, if = 0, choose H Q (Y t ( 0) = 0) if 

P(i 1 H q ) X q > p(E 1 H x ) X ± (6.3.6) 

and choose (y^(0) - 1) if 


p(E I H x \ > P(E [ H Q ) X Q . 


(6.3.7) 
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Thus the problem reduces to finding the optimal memory structure, or 
equivalently, the optimal information field that can be obtained from 
some memory structure. 

It is usefull at this point to define the notion of a rectangle. A 
rectangle EC X Q x x . . . x 3^ is a set of the form 

E = E q X E 1 x ... x E t _ 1 (6.3.8) 

where E Q e X Q/ e E t _^ e \-l“ T ^ e sets E t are s ^ des 

Of E. 


Lemma 6.3.1 

If E £ X is a rectangle, then there is a memory structure 
such that 

E = n r ~ 1 (l) . (6.3.9) 

(The memory structure is said to realize E.) 


Proof 

Let E = E fl x Ej x ... x E T-1* De ^ ne 

n, J 1 

(° x o * E 0 


(6.3.10) 


\ = 


1 X t-1 £ Vl ax>d m t-l = 1 


0 


otherwise 


(6.3.11) 
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for t = 2, 3, . .., T. Obviously, 

(V X l' •**' X T-1 ) £ \ 1{1) 

if and only if x Q e E q , ^ eE^ and x^ E E,^. But the latter 

condition is equivalent to (x^, x^, . . x^^) E E q x E 1 X X E t- 1 = 1 
There are eight possible relationships between p^, Pq» q^» qQ 
(Figure 6.3.1). Because of the obvious symmetries involved, only 

Case 1 P x > q Q > P Q > % 

and 

Case 2 > p Q > q Q > q 1 

need be considered. For case 1, the following result is available. 
Proposition 6.3.2 

For case 1, there is an event for which the probability of 
error is minimum within the class of rectangles either of the form 

{0 } m x {0,1> T_I “ (6.3.12) 

or 

(l} n x {0,l) T-n (6.3.13) 

for some integer m, 0< m < T, or some integer n, 0 < n< T. 

Proof 

Let F be an arbitrary rectangle. If F has n sides that are {l}, m 
sides that are {o}, and T-n-m > 0 sides that are {0,1}, then 
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C P 1 > P 0 * % > q l-^ P 0 > P 1 

P 1 > ^ p o > q l— ' > p o > q i 

q l > % * p 0 > p l- ^ > q l 

q l > P 0 * % > *l~"h > Pi 


* q ! >q 0 



Figure 6.3.1 The Possible Relationships Between 
P 0' P l' V q i* (Excluding p^^ = p^) 
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p(F | H) = p n q" 1 (6.3.14) 

where p = or p = p^ according to H = or H = H^. There are three 
subcases to be examined. 

Case la 

Suppose that the optimal terminal decision function is =1 or 
=0. In this case, the event F is useless, and can be replaced by an 
arbitrary event E without increasing the probability of error. 


Case lb 

n m . n in / ^ 

\ * 1*1 > \) P C q 0 (6.3.15) 

\(1 - P x n q^) ^ V 1 " *0° O (6.3.16) 


In this case, the optimum decision for observation of E is E + 
E -v H q. The corresponding probability of error is 


P 


e 



m. 

) 


+ x o p o 


n 



Since 



1, 



< 1 , 


it follows that 


(6.3.17) 


(6.3.18) 


(6.3.19) 


h p i 


n+1 m-1 


*o p o 


n+1 m-1 


% 


(6.3. 20) 
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Moreover, since 


. n+1 m-1. 

X l (1 " P 1 q i > 


p 

, ... n nu A n m, 1 

" \ (1 - P 1 % ^ " *1 P 1 % ( qT - X) 


(6.3.21) 


, ... n+1 m-1. ... n m . n m p o .. 

V 1 ~ P 0 *0 > = V 1 ‘ P 0 % ] " A o P 0 q 0 “ 1} 


(6.3.22) 


and by (6. 3.15) , (6. 3. 16) , (6.3.18), (6.3.19) it follows that 

. , n+1 m-1. . , . , n+1 m— 1. . 

\(1 " P ± q-L ) < X Q (1 - p Q ^ ). (6.3.23) 

The probability of error for an event E with one side {0} of F changed 
to {1} is therefore 


_ , n+1 m-1. , . n+1 m 

P - = ^ (1 ' P ! q i > + X 0 P 0 % 


(6.3.24) 


e 1' '1 ^1 ”0 ^0 

which is less than the corresponding probability of error for the event F. 


Case lc 


x o p o 


n 




(6.3.25) 


i / .. n m. 

x o (1 - p o »o 1 




^ n m 

X l (1 " P 1 q l } 


(6.3.26) 


By an argument completely analogous to that for case lb, it can 
be shown that an event E with a side {l} of F changed to a {0} has a 
lower probability of error. 

The proposition is established as follows. If F satisfies case la, 
F may be replaced by an arbitrary event that satisfies case lb or 
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case lc. If no such event exists, than any event, and in particular an 

event of the form (6.3.12) or (6.3.13) is optimal. If F satisfies case lb, 

then (by an induction argument) all the sides {0 } of F can be changed 

to sides {1} without increasing the probability of error. If F 

satisfies case lc, then all the sides {1} of F can be changed to {0} . 

Since a rectangular event corresponds to occurance of a particular 

substring of (x Q , x^, ..., x T _^) , it might seem that only rectangular 

events could be realized by a 1-bit memory. If this were true, then the 

problem would be solved (for case 1) since a complete class'*" of memory 

update functions would those determining whether an event of the form 

(6.3.12) or (6.3.13) did or did not occur. However, certain non- 

rectangular events can be realized by a 1-bit memory. 

4 

There are 2 =16 possible functions n : x x t _^ M t for any 

t > 1, since there are four elements of x X ^ and two elements of 

M fc . However, symmetry considerations reduce the number of ti that need 
to be considered to the eight listed in Table 6.3.2. 

Proposition 6.3.3 

Given fi T , either fj or 1 - f) T can be constructed from the eight 
memory update functions in Table 6.3.2, T > 2. 

Proof 

The proof proceeds by induction. Notice that any map r| : x 

M ^ M is either in Table 6.3.2 or 1 - r) is in Table 6.3.2. 

complete class of memory update functions satisfies the condition: for 

any choice of X , p , p , T, the optimum memory update function is in the 
class. u u i 




( “t-l' 

x t-l } 


interpretation 

n t 

0 0 

0 1 

1 0 

1 1 


1 

n t 

1 

1 

1 

1 

no information 

2 

1 

1 

1 

0 



1 

1 

0 


picks out one 

■■ 





point of 

E9 

1 

0 

1 

1 

M . x X . 






t-1 t-1 

n t b 

0 

1 

1 

1 i 


«* 6 

1 

1 

0 

0 

gives m t 

BM 

1 

0 

1 
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gives 
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1 

1 
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gives + x^) mod 2 


Table 6.3.2 Eight Possible Memory Update Functions 






















There is a one-to-one correspondence between functions s x -► 


M and functions H : X x X : M , so that the proposition is true for 

4 ^ U X ^ 

T=2 by the remark above. 

Suppose Proposition 6.3.3 is true for arbitrary T. Note that 
H o (n x i T ) , where i T : X T + X T is the identity map on X T . By 
assumption, either ri T or 1 - can be constructed from the table. 

If can be so constructed, then or 1 - can, since 

T+l ~ ^ T+l ° X V and Since 1 ' ^T+l = (1 " n T+l ) ° (fi T X i T ) * 

If 1 - fj T can be so constructed, modify r) T+ ^ so that = ri T+1 ° 

t(l - n.j,) x i T ] . Then n T+1 or 1 - f| can be so constructed, since 
1 - fL., = (1 - n m .,) 0 1(1 - rL) x i_] and either n or 1 - ri , 

is in the table. The proposition is therefore valid by the principle 
of mathematical induction. 

0 

Suppose = i 0 (the identity on Xg) and n 2 = H 2 • Then 

n 2 _1 (l) ={(0,1), (1,0)} (6.3.27) 

• 

which is a non-rectangular event . The interpretation is that it is 
possible to determine the parity of the string (x^, x^, . 
with a 1-bit memory . This does not seem to be a very interesting thing 
to know, but complicates the analysis greatly. 

An efficient specification of 


I 


T 




(6.3.28) 


would be useful. Z T is not in general a field, but does contain $ , X, 


and 
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is closed under complementation. The problem of specifying I is 
precisely the problem of determining the languages that can be 
accepted by a two state time-varying automaton in T steps. Unfortunately, 
there appears to be little work on this problem available in the 
literature [Hopl, Arbi, Bol] . 

The analysis of this section, while not conclusive, suggests the 
following conjecture. 

Conjecture 

The event for which the probability of error is minimum is a 
rectangular event either of the form 

{0} m x {0,l} T_m (6.3.29) 

or 

{l} n x {0,l} T_n . (6.3.30) 

In the next section, the problem will be reformulated as a FSFM 
stochastic control problem. The minimum principle will be used to find 
events superior to those of the form (6.3.29), (6.3.30). Thus, the 
above conjecture is false. 
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6. 4 Application of the Minimum Principle 

The development in the previous section proceeded independently 
from the remainder of the thesis. In this section, the problem of 
hypothesis testing with 1-bit memory is recast to fit within the FSFM 
format. The utility of the FSFM minimum principle will be illustrated by 
the derivation of a memory update scheme to serve as a counterexample 
to the conjecture of the previous section. 

Let the variables x^ft), x 2 (t), u(t) , w^(t), w^t), v(t) take their 
values in the set {0,l}. Suppose that 

x 1 (0), w Q (l)/ w 1 (1), . w q (T+1), w 1 (T+1) 

is a sequence of independent random variables. Assume that x^O) f w^(t) , 
w^(t) take the value 1 with respective probabilities A^, p Q , p^ and 
let A Q = 1 - A^, q Q = 1 - p , and q^ = 1 - p . Moreover, ^(O) = 0 and 
m(0) = 0. 

Let state equations 

x 1 (t) - x^Ct-l) (6.4.1) 

x 2 (t) - (1 - x x (t-l)) w Q (t-l) + x 1 (t-l) w 1 (t-l) (6.4.2) 

m(t) = v(t) (6.4.3) 

be defined. Then x 2 (t), t = 1, 2, T+l is a sequence of independent 

zero-one random variables with probability p of one and probability q of 
zero. With probability A^, p - p^, and with probability A^, p = p . 


The memory is updated by 
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v(t) = n t (m(t-l), x 2 (t-1) ) (6.4.4) 

for T = 1, 2, . .., T+l and u(T+2) is specified by 

u(T+2) = Y T+2 (m(T+l)) . (6.4.5) 

The cost is 

J = (x x (T+l) - u(T+2)) 2 . (6.4.6) 

Since x^ (T+l) ,u^ (T+2) e (0,1), the expectation of J under the distribution 
defined by r^, . .., 0 T+1 and Y T+2 is simply the probability of error. 

Figure 6.4.1 illustrates the sequence of events. 

When the appropriate identifications are made, the preceeding 
problem can be shown to be equivalent to a FSFM problem. However, it is 
straightforward to write down the equivalent deterministic problem from 
the equations above. 

Notice that the state set is (o,l) x {o,l} x (o,l). This is 
equivalent to the state set = (l, 2, 3, 4, 5, 6, 7, 8) when the 
identifications of Figure 6.4.2 are made. Let = {o,l>. Then the 
restriction on : X^ V m is that be constant on the sets of the 
partition 

{{1,5}, {2,6}, {3,7}, {4,8}} (6.4.7) 

of X ro . (This corresponds to the fact that T) in (6.4.4) cannot depend on 

x^(t).) Similarly, Y T+ 2 5 4 = {0,1} must be constant on the sets 

of the partition 


{{1,2, 5, 6}, {3, 4, 7, 8}} 


(6.4.8) 
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t=0: 'transition to x^(0), x^(0) 

•observation of ^^(0) 
•memory update v(l) 
t-1: -transition to x^l), x^d) 

•observation of x^d) 
•memory update v(2) 


(no information) 
(arbitrary) 

(observation 1) 


t— T-1 ; 'transition to x^tT-l), x^CT-l) 

•observation of x^tT-l) (observation T-1) 

•memory update v(T) 
t=T : -transition to x^(T), x^(T) 

•observation of (T) (observation T) 

•memory update v(T+l) 
t— T+ 1 : ^transition to x^(T+l), x^d+l) 

•choice of control u(T+2) 

•observation of x 2 (T+l) 

•memory update v(T+2) (arbitrary) 


Figure 6-4,1 Sequence of Events 



m(t)*0 

/\ 

x 2 (t)=0 x 2 

X 1 


/\ 


m(t)«0 

/\ 


m(t) «1 

/\ 


<t)=l x 2 (t)«0 x 2 (t)=l x 2 (t)*0 x 2 (t)*=l x 2 (t)-0 x 2 (t)«l 


i 


i 


Figure 6.4.2 Definition of the State Set X for the FSFM Problem 
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of X . 

00 

From Figure 6.4,2 and the problem specification, it is easy to 
compute the parameters of the equivalent deterministic problem. Let P^ 
be the probability of a transition from state i to state j when the 
memory update function is identically equal to v. 


q 0 P 0 ° ° ° ° ° ° 

q 0 P 0 ° ° 0 0 o ° 


q 0 P 0 ° ° ° ° ° ° 

q Q p Q 0 0 0 0 0 0 

0 0 0 0 q^ p^ 0 0 

0 0 0 0 q 0 0 

0 0 0 0 q x p x 0 0 

0 0 0 0 q 1 0 0 


o 0 q 0 P 0 ° 0 o 0 

o 0 q Q p Q 0 0 0 0 

0 0 q Q p Q 0 0 0 0 

0 0 q 0 0 0 0 

0 0 0 0 0 0 p^ 

0 0 0 0 0 0 q 1 

0 0 0 0 0 0 q x p^^ 

0 0 0 0 0 0 q x p 2 


Similarly, 


TTq = [A q 0 0 0 A 0 0 0] 

(All other terms in the cost are zero.) 
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At this point, attention is restricted to the special case T = 3, 
P Q = and X Q = ^i = y. The optimality of the trial solution 

Y 5 * : {1,2, 5, 6} 0, <3, 4, 7, 8> ♦ 1 

T) 4 * : {l,2,3,5,6,7> + 0, {4,8> -► 1 
n 3 * : (1,2, 3, 5, 6, 7} -*■ 0, <4, 8) + 1 
n 2 * s {1,3, 5, 7} -+ 0, {2, 4, 6, 8} + 1 

will be tested by the FSFM minimum principle. 

The condition 

V n 3 

P j 4>* (3) < TT*(2) P <()* (3) 
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Substituting p^ = p^ = ^ t ~ \ - T' t * ie f°ll° w: i- n 9 expression for 

IT* (2) is obtained. 


ir*(2) 


- r^L 

3 

3 

1 

1 

3 

L 32 

32 

32 

32 

32 

32 




The minimizing fj^ for 7T*(2) P <j)*(3) is obtained as follows. Since 

37 < h - « 3 a > - V 5 ’ - °- si "= e i? < w h < ■£?' h < & V 2) ■ 

fi(3) - {4) = ft (6) = ft- (7) = ft (8) =1. Since ft / h * the sequence 
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r^*, r^*, r^*, ri 4 *» Y 5 * cannot be optimal. However, it can be verified 
that r^* 

The interpretation of this result is obtained with the aid of 
Figure 6.4.2. The map r^* = 0 is arbitrarily chosen since the first 
observation contains no information, simply transmits the first 

observation. n 3 *» n 4 * put a 1 in memory if the observation is 1 and the 
previous memory state is 1. The net result is a memory structure n 4 * 
that tests for the observation of three l's. Thus 

{(i,i,D > - n 3 * -1 (i) 

is the event realized. 

In contrast, puts a zero in the memory if the previous memory 
state was zero and the observation was zero- The net effect is a memory 
structure f\ formed from (r| *, r \^* , f\^, rj^*) • The memory structure places 
a zero in memory if the first two observations were zeros. Otherwise, a 
zero or one is placed in memory according to whether the last observation 
is zero or one. Thus the non-re ct angular event realized is 

fi 3 _1 (l) = {(1,1,1), (0,1,1), (1,0,1)}. 

^ 3 *(1) is a considerably closer to the (unrealizable) optimum event 
{(1,1,1), (0,1,1), (1,0,1), (1,1,0)} than rj * *(1): only the event 

(1,1,0) is misclassified. 

There are two noteworthy features of the preceeding analysis . First 
application of the minimum principle has resulted in a counterexample to 
the conjecture of the previous section. This is impressive since the 


V' *3 


V 


, Y,.* does satisfy the necessary conditions. 
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problem certainly revolves around the determination of signaling control 

laws in both the formal and informal meanings of the term. As was 

pointed out in Chapter 3, the presence of signaling control laws 

indicates the absence of a universal extremal so that the minimum 

principle is not necessarily very helpful for this class of problems. 

Second, the preceeding example for T=3 suggests more general memory 

structures for T > 3. For example, consider the event E £ x x 

* . * x X, , 

6 

E = {{l} x {1} x {1} x {0,1} x {0,1} x {1}}U {{1} x {1} x {1} x 

10} x To7> x {l}} 

where A is the complement of the set A. For p = p = X = A. - 

v t i. 4 U X 

as before detection of this event results in a probability of error 

_ 791 < 832 
P e 2048 2048 

, 832 . . . 

where 1S the probability of error the event constructed for the 

case T^3. 

Thus a sequence of events of decreasing probability of error can 
be constructed for T + 00 . However, verification of the optimality of 
these events requires the application of a sufficient condition of 
optimality such as the dynamic programming algorithm discussed in the 


next section. 


H | CM 
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6.5 Application of Dynamic Programming 

In section 2.5, it was stated that the equivalent deterministic 
problem to the FSFM problem was not always the most efficient such 
problem. In this section , this statement is justified by demonstrating 
that the hypothesis testing problem of this chapter is a fact equivalent 
to a deterministic problem with a two dimensional state space. 1 Although 
the dynamic programming equations for the two dimensional problem are 
considerably simplified, the analysis is still too difficult to be 
completed by hand. 

Define the quantities 

ot(t) = Prob (memory = 1 j H Q ) (6.5.1) 

B(t) = Prob(memory = 1 | H^jJ (6.5.2) 

for t = 0, 1, ..., T. Let 6^ (t) = k mean that if m(t-l) = i, x(t-l) = j, 
then m (t) = k, where i, j, k e {o,l>. Then the state equations 

a(t) ” 

- f. (a(t-l), B(t-l); 0(t)) (6.5.3) 

B(t) 

can be written where 

1 This fact was pointed out to the author by Dr. H. S. Witsenhausen. 
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f (a(t-l), S(t-l); 9(t) ) = 
t 

' \i < * > V V 11 V V 1 w p o‘ e oo < *> q o 0 

0 0 11 ^ P l +6 10 (t) %- e oi (t) P l' 0 OO ^ **0 


X 


a(t-l) 

B(t-i) 


e oi <t)p o + e oo (t)q o' 
_ 6 oi(t) p i + e oo (t) ^l. 


(6.5.4) 


Minimizing the probability of error is equivalent to minimizing V T (a/B ) , 
where 

{ ^(1 - a + 6) a > 3 

(6.5.5) 

j(l - a + 3) a < 3 

The state equations (6*5.4) with the cost function (6*5.5) define a 
deterministic optimal control problem equivalent to the problem of 
hypothesis testing with 1-bit memory. 

The dynamic programming backward algorithm for the optimal control 
problem is 


v t-1 (ct, 3) = min V^f^a, ft? 0)) (6.5.6) 

0 

where is given by (6.5.5). The optimal memory update functions are 

£ 

determined as follows. Let 0 (t) be the minimizing control for (a, ft) 

E R^(t) , where R^(t) is a convex region with piecewise linear boundary. ^ 
Then the minimizing control law for (a, ft) e R^(t) is defined by 

^Such a region R^(t) for which the minimizing 0 is constant exists by 
analysis similar to that of Chapter 4. 
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n^U, j) = k (6.5.7) 

jj, 

where 0 (i, j) = k. 

For the case p Q = ■j, p^ = = \ ' the d y nam ^ c programming 

algorithm has been carried for t = T and t = T-l. The results are 
illustrated in Figures 6.5.1 and 6.5.2. Notice the functions V^, V T _^ 
are piecewise linear and concave, as might be expected from the analysis 
in Chapter 4. 

Although obtaining a 2-dimensional equivalent deterministic optimal 
control problem simplifies application of the dynamic programming 
algorithm, the computation of V t _ 2 is still too complicated to be computed 
by hand. Solution by digital computer is required. 
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Figure 6.5.1 The Function V (Note y ^ : {l} -»■ 1 means 
Y^d) = 1) 
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CHAPTER VII 

SUMMARY , CONCLUSIONS, AND SUGGESTIONS 
FOR FURTHER INVESTIGATION 

This chapter summarizes the results of the thesis, with a 
brief discussion of the conclusions that can be drawn from the 
research. A list of possible topics for future investigation 
is included. 

7.1 Summary 

The thesis began in Chapter I with the formulation of a 
rather general problem in the design of engineering control systems . 
The formulation was intended to motivate the FSFM model studied in 
the remainder of the thesis. The FSFM problem is a non-classical 
stochastic control problem, and so the existing literature on this 
and other closely related topics was briefly surveyed. The chapter 
closed with a brief summary of the remaining chapters. 

The FSFM model was introduced in Chapter II. It was demon- 
strated that a number of apparently more general problems 
can be reduced to FSFM models, so that most of the features of the 
general engineering control system of Chapter I can be incorporated 
in the FSFM formulation. Then an example was given to illustrate 
the important signaling strategies that must be considered in 



non-classical stochastic control problems. Finally , a deterministic 
optimal control problem equivalent to the FSFM problem was derived. 

In Chapter III, the FSFM minimum principle was stated and 
proved. A Kuhn extensive game model equivalent to the FSFM problem 
was obtained so that the notion of a signaling strategy could be 
precisely defined. The importance of this concept was established 
by proving the existence of a universal extremal for problems without 
signaling strategies (i.e., with perfect recall) . A numerical 
optimization algorithm, the person-by-person min-H algorithm , was 
derived based on the minimum principle. 

In Chapter IV dynamic programming was considered. As might be 
expected, dynamic programming is not a practical procedure for 
numerical optimization except for simple special cases. 

In Chapter v, the infinite horizon version of the FSFM model 
was formulated. The discounted cost criterion was considered since 
this criterion led to a well-defined equivalent deterministic 
problem. The Value and Policy Iteration methods were extended to 
the FSFM problem, as were algorithms of Sondik implementing these 
methods . 

A problem of hypothesis testing with 1-bit memory was considered 
in Chapter VI. Although an optimal solution was not obtained, use 
of the minimum principle suggested an interesting class of memory 
updates. This result provides some indication that control 
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theoretic methods can be useful for design of information-handling 
systems. 
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7,2 Conclusions 

The fundamental difficulty in non-classical stochastic control 
problems in general, and the FSFM problem in particular, is the 
occurrence of signaling strategies. This phenomenon, which does 
not occur in classical stochastic control, complicates the analysis 
in an essential way, since the choice of control laws at different 
time instants is tightly coupled. As a consequence, the min-H 
algorithm proposed for the numerical solution of FSFM models is 
not guaranteed to converge to the globally optimal solution. However 
as illustrated in Chapter 6, the min-H algorithm in conjunction 
with some engineering judgement in the choice of the initial 
guess can be an effective tool. 

The applicability of the algorithms implementing dynamic 
programming is more limited. The basic difficulty here is classical; 
it is necessary to solve a high dimensional functional equation to 
implement dynamic programming. In spite of the large amount of 
work devoted to this problem, no generally applicable satisfactory 
procedure is available. Thus, the dynamic programming approach is 
appropriate only for problems with a rather small state set (around 
10 states at most) . 

Even the min-H algorithm is not adequate to handle large scale 
engineering systems directly. The problem is basically combinatorial 
all the observation, memory, and communication sets are lumped with 
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the state set so that the state set becomes very large. For 
example, a system with 100 physical states, and two controllers 
each with a 10 state memory set and each observing an output 
that takes 10 values requires a FSFM model with 1 million states! 

A number of techniques must be employed to handle such a 
problem. One generally applicable approach is to remove some of 
the redundancy associated with the FSFM representation of the 
problem by taking advantage of the factorization of the state set 
into the physical and memory sets. For example, notice that with 
the memory updates at a particular instant fixed in the above 
problem, transition to 990,000 of the states (those corresponding 
to memory states not chosen) is impossible. Thus, for computational 
work, it is better to retain the factorization of the state set 
into the physical state set and the memory set. Other factorizations 
may be possible in specific instances. 

An important technique in large scale systems theory is 
aggregation . As applied to the FSFM model, this technique con- 
sists of grouping states together into aggregate states and only 
considering transitions between the aggregate states. The resulting 
model may closely approximate the original model if the aggregate 
states are well chosen, and will be more tractable computationally. 

Another possibility involves utilizing any special structure 
that occurs in a particular large scale problem. Generally speaking, 
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the special structure of the problem will be reflected in the fact 
that many state transitions will not be allowed. Thus, the 
associated state transition matrices will be sparse. A particular 
example of this situation has already been mentioned above in 
connection with the memory sets. Exploitation of the structural 
properties of the transition matrices requires a flexible representa- 
tion. One possibility might be to store the transition matrices 
as a PL/1 data structure. 

To summarize, study of the FSFM model was motivated by the 
problems of control and information in large scale systems. The 
FSFM model does provide a vehicle for the study of phenomena that 
occur in such systems. However, direct solution of large scale 
system problems by the algorithms of this thesis will not be possible, 
in general, due to limitations on the size of the state set for which 
the algorithms are computationally feasible. Techniques such as 
aggregation can be used to reduce a large scale system problem to 
a computationally feasible size, and any special structure of the 
problem should be exploited to mitigate the computational burden. 



179 “ 


7 >3 Suggestions for Future Research 

The study of non-classical stochastic control problems is still 
at an early stage* Therefore, there are many possibilities for 
further investigation* Some of these are listed below* 

(1) Further study and refinement of the FSFM model. 

(a) Study of the interaction of communication and control 
in the FSFM context* 

(b) Study of the tradeoff between employing signaling 
strategies and providing additional communication 
channels, 

(c) Extension of the analysis of Chapter VI to problems 
with larger memories, 

(d) Specialization of the FSFM problem to the case in 
which the sets involved have an algebraic structure, 

(e) Determination of upper and lower bounds for the 
optimal cost without computing the optimal control 
laws * 

(2) Studies aimed at reducing the computational burden* 

(a) Exploitation of the structure of the FSFM state 
space as the product of the physical state set 
with other sets. 

(b) Replacement of the matrix representation of the FSFM 
model with one more suited for computational purposes 
(e*g*, a PL/1 data structure) . 
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(c) Examination of the possibility of parallel computation 
in the min-H algorithm, 

(3) Application of the theory to specific problems. 

(a) Traffic networks [Houl] . 

(b) Computer communication networks [Kal] . 

(4) Extensions of the theory 

(a) To non-sequential stochastic control problems. 

(b) To FSFM games. 

(c) Generalization of the signaling strategy notion to 
continuous state spaces. 

(d) Study of linear designs for linear r quadratic, 

Gaussian problems by techniques similar to those 
developed for the FSFM problem. 
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